Summer school on Optimal Transport, Stochastic Analysis and Applications to Machine Learning

Title & Abstract

Lecture Series

Beatrice Acciaio (ETH Zurich)

Title: New types of (stochastic) transport

Abstract: In this minicourse we will go through some new forms of optimal transport developed in the last years, and see how they combine with tools from stochastic analysis for applications e.g. in mathematical finance. In particular, we will introduce and discuss causal transport, adapted Wasserstein distances, martingale transport and weak transport.

Sinho Chewi (Institute for Advanced Study)

Title: Wasserstein gradient flows for sampling and variational inference

Abstract: The theory of gradient flows in the space of probability measures equipped with the Wasserstein metric from optimal transport has recently led to the analysis and development of many algorithms for machine learning. After providing a quick introduction to the theory, I will describe applications to the problems of sampling and variational inference.

Dejan Slepcev (Carnegie Mellon University)

Title: Optimal transport and gradient flows for problems in high dimensions

Abstract: We will start by reviewing some of the statistical properties of optimal transport, namely discuss how well can one approximate a measure using particles in arbitrary dimension. This will lead us to study ways to measure the approximation error, namely the sliced Wasserstein distance and maximum mean discrepancy. Motivated by problems in sampling and variational inference we will consider various gradient flows and their, mostly deterministic, particle approximations. Particular attention will be given to understanding their performance in high dimensions.

Feng-Yu Wang (Tianjin University)

Title: Wasserstein Limit for Empirical Measures of Markov Processes

Abstract: We introduce recent progress on the convergence in Wasserstein distance for the empirical measures of Markov processes. The lecture consists of the following three parts:

1) Sharp convergence rate and renormalization limits in 2-Wasserstein distance for diffusion processes.

2) Sharp convergence rate in p-Wasserstein distance for diffusion processes.

3) Convergence rate in p-Wasserstein distance for Markov processes.

Invited Talks

Brendan Pass (University of Alberta)

Title: An introduction to optimal transport

Abstract: I will briefly introduce the optimal transport problem and some of its key features and properties. The goal is to rapidly bring the audience up to speed and prepare them for the subsequent min-courses and talks.

Chenchen Mou (City University of Hong Kong)

Title: Minimal solutions of master equations for extended mean field games

Abstract: In an extended mean field game the vector field governing the flow of the population can be different from that of the individual player at some mean field equilibrium. This new class strictly includes the standard mean field games. It is well known that, without any monotonicity conditions, mean field games typically contain multiple mean field equilibria and the wellposedness of their corresponding master equations fails. In this paper, a partial order for the set of probability measure flows is proposed to compare different mean field equilibria. The minimal and maximal mean field equilibria under this partial order are constructed and satisfy the flow property. The corresponding value functions, however, are in general discontinuous. We thus introduce a notion of weak-viscosity solutions for the master equation and verify that the value functions are indeed weak-viscosity solutions. Moreover, a comparison principle for weak-viscosity semi-solutions is established and thus these two value functions serve as the minimal and maximal weak-viscosity solutions in appropriate sense.

In particular, when these two value functions coincide, the value function becomes the unique weak-viscosity solution to the master equation. The novelties of the work persist even when restricted to the standard mean field games. This is based on a joint work with Jianfeng Zhang.

Hanbaek Lyu (University of Wisconsin-Madison)

Title: Concentration and limit of large random matrices with given margins

Abstract: We study large random matrices with i.i.d. entries conditioned on the event that their margins (row and column sums) are close to prescribed values.This problem has rich connection to random graphs with given degree sequences, enumeration of of contingency tables, and static Shr\"{o}dinger bridge. We show that such margin-conditioned random matrices concentrate around a certain deterministic matrix that has two `dual' characterizations: the minimum relative entropy matrix resulting from entry-wise exponential tilting of the base model to match the target margin, and as the expectation of the maximum likelihood model obtained by rank-one tilting of the base model for the target margin. Additionally, if we have a sequence of 'tame' margins that converge in $L_{1}$ to a limiting continuum margin as the size of the matrix diverges, we demonstrate that the conditioned random matrix converges to a limiting kernel in cut norm, and we obtain a bound on the rate of convergence. We derive several new results for random contingency tables from our general framework.

Se-Young Yun (KAIST)

Title: Uncertainty Analysis in Preference Feedbacks and Preference Transport

Abstract: Learning from Human Feedback has gained significant attention, particularly due to the success of ChatGPT, which aligns human preferences using Reinforcement Learning from Human Feedback (RLHF). Within the Bradley-Terry model, feedback can be framed as a logistic bandit problem, a prevalent method for modeling user choices, such as click versus no click in advertisement recommender systems. Previous works often overlook dependencies in $S$, the radius of the unknown parameter vector domain, which becomes problematic when $S$ is large. This talk introduces a novel approach called Regret-to-Confidence Set Conversion (R2CS), which constructs a convex confidence set using only the existence of an online learning algorithm with a regret guarantee. Utilizing R2CS, we achieve a significant improvement in the regret bound with respect to $S$ in logistic bandits, while maintaining computational feasibility and considering other factors such as input dimension $d$ and the number of rounds $T$. Additionally, I will discuss an innovative model-free approach that learns optimal transport from the unpreferred distribution to the preferred distribution.

Dan Mikulincer (MIT)

Title: Tensorized stochastic localization processes and efficient sampling from tensor Ising models

Abstract: Stochastic localizations are an emerging set of techniques useful in the study of high-dimensional probability. At its core lies the idea that a-priori complicated measures can be simplified by performing randomized Gaussian tilts. These tilts then allow for the use of various Gaussian comparison inequalities. In many cases, the Gaussian framework is well understood, and this strategy has led to advances in various fields, from convex geometry to complexity theory.

In this talk, motivated by algorithmic questions related to tensor Ising models, we will go beyond the Gaussian framework and introduce a variant of the stochastic localization process that allows for non-Gaussian tilts. In the first part of the talk, we will present the original localization process and explain the main natural difficulties that arise when moving beyond the standard setting. The second part of the talk will focus on overcoming these hurdles and will include a challenging riddle in stochastic analysis.

Asuka Takatsu (Tokyo Metropolitan University)

Title: Sliced and Disintegrated Monge--Kantocovich metrics.

Abstract: I first introduce a two-parameter family of metrics on a subspace of Borel probability measures on Euclidean space, called the sliced Monge--Kantorovich metrics. These include the sliced Wasserstein and max-sliced Wasserstein metrics. I then explain the geometry of these metric spaces. In particular, the spaces are (except for an endpoint case) not geodesic. To recover the geodesic property, I introduce a two-parameter family of metrics on a subspace of Borel probability measures on a metric fiber bundle, called the disintegrated Monge--Kantorovich metrics. Finally I discuss an associated barycenter problem. This talk is based on joint work with Jun KITAGAWA (Michigan State University).

Hyungju Hwang (POSTECH)

Title: Neural PDE Solvers toward Digital Twin: Theory and Applications

Abstract: A digital twin is a virtual representation of real-world physical objects. In this talk, I will briefly introduce how Physics can be encoded into Neural Networks such as PINN and Operator Learning. Then we will explore real-world applications of AI-based partial differential equation (PDE) solvers in various fields.

Jakwang Kim (University of British Columbia)

Title: Statistical inference of convex order by Wasserstein projection

Abstract: In this paper, we study statistical inference problem of convex order between two probability measures by the Wasserstein projection. Given a hypothetical problem of convex order, we propose a decision rule, which is based on the Wasserstein projected distance between an empirical measure and the Wasserstein (backward) cone of target empirical measure. We prove the quantitative consistency of our estimator, and provide the usual analysis about $p$-value and type I/II errors. Furthermore, we verify that this problem can be computationally solvable, by translating it to a linear regression-type problem over a convex hull of the support of empirical distribution.

Andrew Warren (University of British Columbia)

Title: Wasserstein principal curves

Abstract: A "principal curve" of a data distribution refers to a class of nonlinear generalizations of the first principal component: namely, we seek a continuous curve which locally "passes through the middle" of the distribution. Originally proposed in the context of statistics by Hastie and Stuetzle, the principal curve problem is closely related to the "average distance problem" in the calculus of variations. Our work introduces the principal curve problem in the Wasserstein space of probability measures. We relate this problem to the problem of finding minimizers to a length-penalized average-distance variational problem in this space; for said problem, we then prove existence of minimizers, stability with respect to the underlying data distribution, and consistency of a discretized variational problem. This latter problem enjoys a "coupled Lloyd's algorithm" type numerical scheme which can be feasibly implemented via existing Wasserstein barycenter solvers.

Lastly, we apply this general framework to a concrete problem motivated by recent developments in measurement technologies for gene expression data. Namely, given a stochastic process and batches of samples from different temporal marginals, but without temporal labels on the batches, can we infer the temporal ordering of the batches? This is a version of the so-called seriation problem. We thus show that principal curves in Wasserstein space can be employed as a consistent seriation method for empirical distributions. (Joint work with: Anton Afanassiev, Forest Kobayashi, Young-Heon Kim, and Geoffrey Schiebinger)

Tongseok Lim (Purdue University)

Title: Maximizing Marginal Variance in Martingales for Unsupervised Learning

Abstract : We present an unsupervised statistical learning approach that investigates all martingale couplings with one variable and one fixed marginal. In this framework, we propose that the variable marginal distribution, which maximizes variance, serves as a solution to the learning objective. We illustrate the applicability of this approach in a variety of unsupervised learning contexts, such as data clustering and principal curve and surface inference. We establish the existence of solutions to our optimization scheme and provide consistency result. Additionally, we show that a specific instance of our method is equivalent to classical principal component analysis (PCA), implying that our approach generalizes PCA.

Chulhee Yun (KAIST)

Title: Understanding Modern Data Augmentation Techniques

Abstract: In this talk, I will talk about an ongoing line of research on understanding the provable benefits of widely adopted data augmentation techniques such as Mixup, Cutout, and CutMix. In the first part of the talk, we investigate how pairwise data augmentations like Mixup and CutMix affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. We show that Mixup and CutMix greatly reduces the number of samples required to precisely locate the optimal boundary, whereas vanilla training suffers the “curse of separability”: the necessary and sufficient number of samples grow exponentially as positive and negative samples become more separable. In the second part, we study the augmentation techniques from a feature learning perspective. By adopting a popular assumption called the feature-noise data distribution, we show that Cutout and CutMix allow a 2-layer convolutional neural network to learn rarer feature vectors than vanilla training, hence leading to superior generalization performance.

Nabarun Deb (University of Chicago)

Title: Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm

Abstract: We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the 2-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Ampère PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.

Short Talks

1. Fang Rui Lim (University of Oxford)

Title: Causal transports on path space

Abstract: We briefly describe a characterisation of (bi-)causal transport maps between laws of stochastic differential equations and its consequences, e.g. the density of bicausal Monge transport maps among bicausal transport plans, or when a bicausal transport plan is concentrated on the graph of a function.

This is a joint work with Prof. Rama Cont.

2. Krzysztof Ciosmak (Fields Institute for Research in Mathematical Sciences and University of Toronto)

Title: Localisation for constrained transports

Abstract: We consider an analogue of the irreducible convex paving of the martingale transport theory in the context of generalised convexity. Consider two probability measures μ, ν ordered with respect to a max-stable convex cone F of functions on Ω. We establish the existence of the finest partitioning of Ω, depending only on μ, ν and the cone F, into F-convex sets, called irreducible components, such that any F-transport between μ and ν must adhere to this partitioning. We also provide a charactersation of sets negligible with respect to all F-transports between μ and ν (i.e. polar sets), whose sections are contained in the corresponding irreducible components, as the sets that project to μ and ν negligible sets. This provides an affirmative answer to a generalisation of a conjecture proposed by Obłój and Siorpaes regarding polar sets in the martingale transport setting. We present applications to the localisation of the Monge—Kantorovich problem, to the martingale problems as well as to the submartingale transport problem in the infinite-dimensional setting.

3. Kateryna Hlyniana (Jilin University)

Title: Optimal control problem for equation with interaction

Abstract: In the talk we introduce a stochastic differential equation with interaction and a control problem for it. To get the stochastic maximum principle for optimal control we define a generalized backward SDE with interaction and prove the existence and uniquesness of solution to it.

4. Sangmin Park (Carnegie Mellon University)

Title: The Vlasov-Fokker-Planck equation as a minimizing movement with fixed marginals in the Wasserstein space

Abstract: The Vlasov-Fokker-Planck equation describes evolution of the probability density of the position and velocity of particles under the forces of external confinement, interaction, friction, and Brownian motion. It is well-known that this equation can be formally seen as a dissipative Hamiltonian system in the 2-Wasserstein space of probability measures. In order to better understand this geometric formalism, we introduce a time-discrete variational scheme in the 2-Wasserstein space, solutions of which converge to that of the Vlasov-Fokker-Planck equation as time-step vanishes. The variational scheme is based on the symplectic Euler scheme for Hamiltonian systems in Euclidean spaces, and has several desirable properties, including geodesic-convexity, unique solvability, and a two-sided bound on the dissipation of the energy over discrete solutions.

5. Garrett Mulcahy (University of Washington)

Title: Iterated Schrödinger Bridge Approximation to Wasserstein Gradient Flows

Abstract: We present a novel discretization scheme for Wasserstein gradient flows that relies upon successive pushforwards by barycentric projections of same-marginal Schrödinger bridges. We will share progress towards a statement about the convergence of this method to the heat flow, as was first observed by Sander, Ablin, Blondel, and Peyre in a machine learning context. Joint work with Soumik Pal, Medha Agarwal, and Zaid Harchaoui.

6. Youngho Kim (Indiana University Bloomington)

Title: Solvability of singular Abreu equations in higher dimensions

Abstract: Singular Abreu equation is a system of two equations, where one of them is a Monge–Ampère equation and the other is a linearized Monge–Ampère equation. They arise in approximation of convex functionals subject to a convexity constraint. Earlier works established the solvability of their second boundary value problems either in two dimensions, or in higher dimensions under either a smallness condition or a radial symmetry condition. In this talk, we will introduce the problem, briefly discuss the connection to minimizers of convex functionals with a convexity constraint, and discuss the improvement to higher dimensional case. This is based on joint work with Nam Le, Ling Wang, and Bin Zhou.

7. Junghyun Lee (KAIST)

Title: Nearly Optimal Latent State Decoding in Block MDPs

Abstract: In this short talk, we introduce the problem of clustering in block Markov Decision Processes, which have vast applications in reinforcement learning with rich observations. We first provide an instance-wise, information-theoretic lower bound on the clustering error rate inspired by rich literature on nonparametric statistics. We then provide a two-step clustering algorithm whose performance guarantees that it is near-optimal, i.e., its achieved clustering error rate matches the lower bound. We then briefly mention its implications in reward-free RL in block MDPs. This is joint work with Yassir Jedra (KTH EETCS -> MIT EECS), Se-Young Yun (KAIST AI), and Alexandre Proutiere (KTH EECS).

8. Noboru Isobe (University of Tokyo)

Title:A convergence result of a continuous model of deep learning via Łojasiewicz--Simon inequality

Abstract: This study focuses on a Wasserstein-type gradient flow, which represents an optimization process of a continuous model of a Deep Neural Network (DNN). Our main result is the convergence of flow to a critical point of loss as time goes to infinity.

9. Yasuaki Fujitani (Osaka university)

Title: Transport distance between Grover walks on graphs and coarse Ricci curvature

Abstract :We will introduce a new definition of the coarse Ricci curvature induced by Grover walks on graphs and hypergraphs. This talk is based on the following paper:Fujitani, Y., Kiumi, C. Transport distance between Grover walks on graphs and coarse Ricci curvature. Quantum Inf Process 23, 180 (2024). https://doi.org/10.1007/s11128-024-04373-2

10. Forest Kobayashi (University of British Columbia)

Title: Monge-Kantorovich fitting under a Sobolev Budget

Abstract: Given a measure rho with n-dimensional support, how can we best “approximate” rho via an m-dimensional set (m < n)? When m=1 this is essentially the (Euclidean) “principal curve” problem discussed by Andrew; there, the approximating set is defined as the image of a curve f : [0,1] -> R^n. In order to ensure we don’t “cheat” by taking f to be space-filling it is necessary to add some method of controlling the complexity of f; a straightforward approach is to bound the arc length ||f’||_{L^1} above by a constant L.

We introduce a similar problem for general m < n in which the arc length ||f’||_{L^1} is replaced by a Sobolev norm ||f||_{W^{k,q}}. This allows us to add control over the higher-order derivatives of f, which is desirable in some real-world applications such as routing problems. However, the severe parametrization-dependence of ||f||_{W^{k,q}} also adds many theoretical challenges that are not present in the arc length case. We give a prototypical example of one of these challenges, and, time permitting, briefly discuss our strategy for working around it.

Joint work with Young-Heon Kim (University of British Columbia) and Jonathan Hayase (University of Washington)

11. Joshua Hiew (University of Alberta)

Title: An ODE for entropic optimal transport and its linearly constrained variants

Abstract: We characterize the solution to the entropically regularized optimal transport problem by a well-posed ordinary differential equation (ODE). Our approach works for discrete marginals and general cost functions, and in addition to two marginal problems, applies to multi-marginal problems and those with additional linear constraints. Solving the ODE gives a new numerical method to solve the optimal transport problem. We illustrate this method with several numerical simulations. The formulation of the ODE also allows one to compute derivatives of the optimal cost when the ODE parameter is 0, corresponding to the fully regularized limit problem in which only the entropy is minimized.

12. Wanxin Li (University of British Columbia)

Title: Transport-based transfer learning on Electronic Health Records: Application to detection of treatment disparities

Abstract: Recent advances in optimal transport have led to powerful applications in healthcare. We will present a novel method that leverages barycentric projections from transport maps to perform transfer learning between populations for learning tasks, motivated by the detection of disparities (e.g. in treatment duration) from electronic health records. Our results also led to a generalization error upper bound that can be used as a guide to assess the suitability of our method on a given dataset.

13. Aryan Tajmir Riahi (University of British Columbia)

Title: Transport based alignment of 3D density maps of biological molecules from cryo-EM

Abstract: Recent advances in microscopy and techniques for extracting biological shapes, from the atomic to the cellular scales, have led to a surge of data with new mathematical and computational challenges associated with it. In the context of large 3D voxelized images of biomolecules imaged from single particle cryogenic electron microscopy (cryo-EM 3D maps), partial and full alignment of maps remains challenging. To address this issue, we propose a procedure that first finds a coupling between 3D point-cloud representations associated with their so-called unbalanced Gromov Wasserstein divergence, and second, uses this coupling to find an optimal rigid body transformation. Our method generally outperforms standard methods for aligning subunits of a protein complex and fitting atomic models to a density map, suggesting potential applications of Optimal Transport for improving Cryo-EM pipelines.