# Optimal transport and machine learning

### Four introductory tutorials

### Held virtually at IMSI

### July 6-7, 2021

The tutorials are held virtually at IMSI. Each of the four lectures is about 2 hours long. The primary intended audience includes students and researchers in related fields, the only prerequisite is basic knowledge of probability theory and statistics.

These tutorials form part of a larger sequence and serve as introduction to a research workshop on applied optimal transport that will take place in May 2022 within the Spring 2022 Long Program at IMSI. Funding by the National Science Foundation is gratefully acknowledged.

Organizer: Marcel Nutz (mnutz@columbia.edu)

## Registration

Participation is free and open, but **registration is required** to obtain Zoom access. To register, please go to the program website. Login to (or create) your IMSI account as directed and click *Apply*. A few days before the start date, you will receive an email with a Zoom link valid for the entire program.

This tutorial introduces the optimal transport problem and some of its fundamental mathematical results: existence of optimal transports, geometric characterization, dual problem, etc.

## Distribution-free nonparametric inference using optimal transport

Bodhisattva Sen (Columbia)

July 6th, 2 p.m. (EDT)

Nonparametric statistics, a subfield of statistics that came into being after the introduction of Wilcoxon's tests in Wilcoxon (1945), is traditionally associated with distribution-free methods, e.g., hypothesis testing problems where the null distribution of the test statistic does not depend on the unknown data generating distribution. Although enormous progress has been made in this field in the last 100 years, most of the distribution-free methods have been restricted to one-dimensional problems.

Recently, using the theory of optimal transport, many distribution-free procedures have been extended to multi-dimensions. Prominent examples include: (a) two-sample equality of distributions testing, (b) testing for mutual independence, etc. In this lecture, I will summarize these recent developments and: (i) provide a general framework for distribution-free nonparametric testing in multi-dimensions, (ii) propose multivariate analogues of some classical methods (including Wilcoxon's tests), and (iii) study the (asymptotic) efficiency of the proposed distribution-free methods. I will also compare and contrast these distribution-free methods with kernel-based methods that are very popular in the machine learning literature. In summary, I will illustrate that these distribution-free methods are as powerful/efficient as their traditional counterparts and more robust to heavy-tailed outliers and contamination.

In this tutorial, we will start by introducing different notions of distance between probability measures and see how they compare on toy learning problems, both from a theoretical and practical perspective. We will then focus on regularized optimal transport and provide fast algorithms to compute it, along with theoretical guarantees. We will finish with an overview of learning problems that can be tackled using optimal transport.

In this tutorial, we will consider a fundamental statistical question of optimal transport: how well can optimal transport distances and maps be estimated from data? We will discuss the pervasive "curse of dimensionality" in the statistical theory of optimal transport and discuss assumptions—such a smoothness, sparsity, and low-dimensionality—which can partially ameliorate this curse. We will also explore minimax lower bounds, which establish that in the absence of such assumptions, it is not possible to avoid the curse of dimensionality entirely. These pessimistic results help to motivate several variants of optimal transport which have recently become popular in machine learning.