Monday
Monday, June 23, 09:00 – 09:45: Eric Vanden-Eijnden
Title: Generative modeling with flows and diffusions
Abstract: Dynamical transport-based generative models have revolutionized unsupervised learning. These models construct maps between probability distributions, transforming samples from one into samples from another. While initially developed for image generation, they also show promise in previously intractable high-dimensional problems across scientific computing domains. This talk explores the mathematical foundations of flow and diffusion-based generative models, demonstrating how deeper understanding of their mechanisms improves design. I'll present methods for structuring transport to efficiently reach complex target distributions while optimizing both learning and sampling.
Monday, June 23, 09:45 – 10:30: Huyên Pham
Title: Bridging Schrödinger and Bass for generative modeling
Abstract: TBD
Monday, June 23, 11:00 – 11:45: Mathieu Blondel
Title: Joint Learning of Energy-based Models and their Partition Function
Abstract: Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing in the space of continuous functions. Furthermore, we show that our approach naturally extends to the broader family of Fenchel-Young losses, allowing us to obtain the first tractable method for optimizing the sparsemax loss in combinatorially-large spaces. We demonstrate our approach on multilabel classification and label ranking.
Monday, June 23, 11:45 – 12:30: Rama Cont
Title: Causal transport on path space
Abstract: We study properties of causal couplings for probability measures on the space of continuous functions. We first provide a characterization of all bicausal couplings between (weak) solutions of stochastic differential equations. We then provide a complete description of all such bicausal Monge couplings. In particular, we show that bicausal Monge couplings ofd-dimensional Wiener measures are induced by stochastic integrals of rotation-valued integrands. As an application, we give necessary and sufficient conditions for bicausal couplings to be induced by Monge maps and show that such bicausal Monge transports are dense in the set of bicausal couplings between laws of SDEs with regular coefficients. Joint work with Fang Rui LIM (Oxford).
Monday, June 23, 14:30 – 15:15: Arnaud Doucet
Title (tutorial): From Denoising Diffusion Models to Schrodinger Bridges
Abstract: TBD
Monday, June 23, 15:15 – 16:00: Arnaud Doucet
Title: Accelerated Diffusion Models via Speculative Sampling
Abstract: TBD
Monday, June 23, 16:30 – 17:15: Ziad Kobeissi
Title: Approximating solutions to some linear second-order PDEs using continuous-time data
Abstract: TBD
Monday, June 23, 17:15 – 17:45: Annette Dumas
Title: Deterministic Mean Field Games with jumps
Abstract: The Mean Field Game we consider is motivated by the modelization of the housing dynamics where each inhabitant can move from one place to another. In particular, the trajectories of the agents are piecewise constant and they minimize a cost consisting in the number of jumps (or relocations) and two terms depending on the density: the first one is variational and the other one is non-variational.
A Nash equilibrium for this mean field game is a measure over the curves minimizing a problem in a Lagrangian form which depends on the measure itself. To prove the existence of a Nash equilibrium, we reformulate the problem, thanks to an optimal transport result, in a Eulerian form for which we prove regularity results. The Eulerian formulation also allows us to perform numerical simulations thanks to a fast proximal dual gradient method.
Tuesday
Tuesday, June 24, 09:00 – 09:45: Borjan Geshkovski
Title: Conditional flows, approximation of diffeomorphisms, and triangular structures
Abstract: In the context of (normalizing) flow matching, one essentially parameterizes the vector field of the continuity equation using a two-layer neural network and fits the parameters to minimize a discrepancy between the resulting solution and an unknown target measure. We focus on the conditional transport problem, where the goal is also to approximate the transport map that pushes forward the initial condition to the unknown target measure. We provide an explicit construction of parameters that are piecewise constant in time, enabling the simultaneous approximation of both the measure (in total variation) via the continuity equation and the transport map (in L2) via the associated semigroup. This construction has the desirable property that the resulting semigroup closely resembles the Knothe–Rosenblatt rearrangement between suitable discretizations of the measures. We also discuss some connections with A. Shnirelman’s constructions of approximate solutions to the Euler equations.
Tuesday, June 24, 09:45 – 10:30: Sinho Chewi
Title: A local error framework for KL divergence via shifted composition
Abstract: Local error analysis is a standard framework for establishing error estimates for the numerical discretization of stochastic systems. However, it is traditionally limited to guarantees in the Wasserstein metric. In this talk, I will describe a strengthening of this framework which yields bounds in the stronger sense of KL divergence or relative entropy. At the heart of this result is a technique to use coupling arguments to control information-theoretic divergences. This technique, which we call “shifted composition”, builds on works developed with my co-authors Jason M. Altschuler and Matthew S. Zhang.
Tuesday, June 24, 11:00 – 11:45: Gabriel Peyré
Title: Diffusion Flows and Optimal Transport in Machine Learning
Abstract: Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this work, we expose the definition and properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of independent conservation laws via Lie algebra computations. This procedure recovers the conservation laws already known for linear and ReLU neural networks for Euclidean gradient flows, and prove that there are no other laws. We identify new laws for certain flows with momentum and/or non-Euclidean geometries. Joint work with Gabriel Peyré and Rémi Gribonval. Associated papers: https://arxiv.org/abs/2307.00144 https://arxiv.org/abs/2405.12888
Tuesday, June 24, 11:45 – 12:30: Alpár R. Mészáros
Title: Quantitative convergence for displacement monotone Mean Field Games of controls
Abstract: In this talk we present some recent results about quantitative convergence of a general class of N-player stochastic differential games, when agents interact through the empirical measure supported on both states and controls. Our analysis is based on a careful comparison between open and closed loop Nash equilibria, and the mean field Nash equilibrium. A particular challenge is to understand the properties of an additional fixed point map and to obtain dimension-free estimates as N increases to infinity. Our quantitative analysis doesn’t use the master equation and it relies on displacement monotonicity techniques. The talk will be based on a joint work with J. Jackson (University of Chicago).
Tuesday, June 24, 14:30 – 15:15: Jean-David Benamou
Title: Entropic regularisation and Optimal Martingale Transport
Abstract: The Entropic regularization of the classical quadratic optimal transport control problem (aka Schr¨odinger problem) can be extended first to solve numerically drift controlled diffusion processes Benamou et al. (2018). Then in a second part he showed how time discretisation of the relative entropy can be used to characterize the (linear) time scaling under which the relative entropy between diffusion processes becomes a divergence between the diffusion coefficients of its arguments instead of blowing up when they are singular. This object is also known as “Specific Relative Entropy”. Then classical Entropic optimal numerical methods, like Sinkhorn, are generalised to diffusion controlled diffusion processes.
Tuesday, June 24, 15:15 – 16:00: Lenaic Chizat
Title: The three effects of dropout: phase diagram of dropout in wide neural networks
Abstract: Dropout—randomly deactivating units during training—is one of the most widely used heuristics to improve the performance of deep neural networks. Despite its empirical success, the current theoretical understanding of dropout remains limited and offers little guidance on its implementation.
In this talk, I will present a new theoretical framework to analyze dropout. It consists in studying the mean-field asymptotics of gradient descent on wide neural networks with dropout, as a function of the relative scaling of width, dropout rate, and learning rate. In the two-layer case, this yields a rich phase diagram comprising five qualitatively distinct regimes including a penalized Wasserstein gradient flow and a limit described by a mean-field jump process.
This phase diagram reveals three separate mechanisms by which dropout alters the training dynamics: (i) propagation noise (ii) dropout-induced penalization (iii) random geometry. I will discuss the nature and properties of these three effects in the context of deep architectures (beyond the two-layer case).
This is based on a joint work with Pierre Marion and Yerkin Yesbay
Tuesday, June 24, 16:30 – 17:00: Maxime Sylvestre
Title: Comparison principles for variational problems
Abstract: Multiple PDEs in divergence form (laplacian, fractional laplacian,...) can be shown to enjoy a comparison principle thanks to the submodularity of their associated energy. The dual functional in optimal transport happens to also be submodular. This in turns implies that the optimal transport cost (for a wide range of cost : entropic regularization, unbalanced OT, l.s.c. Costs, ...) enjoys a dual property which is novel: exchangeability. From this we deduce a comparison principle for the JKO scheme which doesn't require the usual regularity assumptions on the cost. (Joint with Flavien Léger)
Tuesday, June 24, 17:00 – 17:30: Jiayang Yin
Title: GANs as a Mean Field Type Game: a controlled neural ODEs perspective
Abstract: Deep learning is pushing the boundaries of data science with rapidly emerging applications, often characterized by a lack of explainability and interpretability of the employed neural networks. Recent studies have demonstrated that through continuous-time formalism, infinitely deep residual networks can be seen as a mean-field optimal control problem. This enables a theoretical analysis of the limiting problem, characterizing the optimality conditions through stochastic control tools. In this paper, we focus on interpreting generative adversarial networks composed of two residual neural networks as a mean-field type game. By exploring the interaction between the generator and the discriminator, we determine the Hamilton-Jacobi-Isaacs equations and prove the Pontryagin maximum principle associated with this problem. Leveraging the characterization as a viscosity solution of a nonlinear PDE, we verify the Isaacs condition. Moreover, utilizing the Pontryagin maximum principle, we propose a novel training approach and discuss its connection to traditional training based on stochastic gradient descent.
Wednesday
Wednesday, June 25, 09:00 – 09:45: Shuyang Ling
Title: Local geometry determines global landscape in low-rank factorization for synchronization: theory and statistical bounds
Abstract: The orthogonal group synchronization problem, which focuses on recovering orthogonal group elements from their corrupted pairwise measurements, encompasses examples such as high-dimensional Kuramoto model on general signed networks, $\mathbb{Z}_2$-synchronization, community detection under stochastic block models, and orthogonal Procrustes problem. The semidefinite relaxation (SDR) has proven its power in solving this problem; however, its expensive computational costs impede its widespread practical applications. We consider the Burer-Monteiro factorization approach to the orthogonal group synchronization, an effective and scalable low-rank factorization to solve large scale SDPs. Despite the significant empirical successes of this factorization approach, it is still a challenging task to understand when the nonconvex optimization landscape is benign, i.e., the optimization landscape possesses only one local minimizer, which is also global. In this work, we demonstrate that if the degree of freedom within the factorization exceeds the condition number of the ``Laplacian" (certificate matrix) at the global minimizer, the optimization landscape is absent of spurious local minima. Our main theorem is purely algebraic and versatile, and it seamlessly applies to all the aforementioned examples: the nonconvex landscape remains benign under almost identical condition that enables the success of the SDR. Finally, we will discuss the statistical sides of group synchronization by quantifying the uncertainty of both MLE and spectral estimators.
Wednesday, June 25, 09:45 – 10:30: Joan Bruna
Title: On Inverse Problems and Diffusions
Abstract: TBD
Wednesday, June 25, 11:00 – 11:45: Charles Bertucci
Title: Hamilton-Jacobi equations on the space of probability measures
Abstract: Hamilton-Jacobi equations on the space of probability measures naturally arise in optimal transport (or more generally mean field optimal control problems), in study of large deviations of mean field systems, in potential mean field games or even in some abstract point of views in machine learning. After recalling how we derive such equations, I will explain the main difficulties in the adaptation of the theory of viscosity solutions to such infinite dimensional equations, namely in the proof of a comparison principle. I will present various recent results and explain why a Lagrangian point of view can turn out to be much more effective.
Wednesday, June 25, 11:45 – 12:30: Renyuan Xu
Title: Mean-field Schrödinger bridge for generative AI: relaxed formulation and convergence analysis
Abstract: Motivated by the recent success of diffusion models and flow matching in generative AI, we investigate a relaxed Schrödinger bridge problem that bridges dynamic optimal transport and modern generative modeling. This relaxed formulation offers a unified mathematical framework for tackling several key challenges in generative AI, including data generation from noise, distributional fine-tuning, and transfer learning. By replacing the hard terminal distribution constraint with a penalty term, the relaxed Schrödinger bridge becomes more amenable to computational implementation than its classical counterpart. The resulting problem takes the form of a McKean–Vlasov-type stochastic control problem with a distinctive structure. We establish the well-posedness of this problem by analyzing the associated system of forward-backward stochastic differential equations (FBSDEs). In addition, we prove that both the optimal control strategy and the value function converge linearly as the penalty parameter tends to infinity. Our analysis is based on a novel framework that combines Doob’s $h$-transform with a static optimization problem over probability measures. This is based on joint work with Jin Ma (USC) and Ying Tan (UCSB).
Thursday
Thursday, June 26, 09:00 – 09:45: Zhenjie Ren
Title: Self-fictitious play for Mean Field Games
Abstract: In this talk, we present a new mechanism for approximating Nash equilibria in ergodic mean field games, under the assumptions that the game is both potential and monotone. Drawing inspiration from fictitious play in MFGs and self-interacting dynamics used to approximate the long-time behavior of McKean–Vlasov equations, we introduce a novel algorithm, which we call self-fictitious play. We will outline how coupling methods and the Lions–Lasry divergence can be employed to establish the convergence of this algorithm.
Thursday, June 26, 09:45 – 10:30: Matt Jacobs
Title: The signed Wasserstein barycenter problem
Abstract: Barycenter problems encode important geometric information about a metric space. While these problems are typically studied with positive weight coefficients associated to each distance term, more general signed Wasserstein barycenter problems have recently drawn a great deal of interest. These mixed sign problems have appeared in statistical inference setting as a way to generalize least squares regression to measure valued outputs and have appeared in numerical methods to improve the accuracy of Wasserstein gradient flow solvers. Unfortunately, the presence of negatively weighted distance terms destroys the L^2 convexity of the unsigned problem, resulting in a much more challenging optimization task. In this talk, I will discuss some theoretical properties of these mixed sign barycenter problems, focusing on sufficient conditions to guarantee the global optimality and uniqueness of a critical point.
Thursday, June 26, 11:00 – 11:45: Esteban G. Tabak
Title: Data-analysis through the optimal transport barycenter problem
Abstract: The [Monge] optimal transport barycenter problem can be posed as follows: given a joint distribution pi(x, z) between two sets of variables, the outcome x and and the factors z, find among all z-dependent maps y = T_z(x) that make y independent of z, the one that minimizes a total transportation cost C from x to y. Such map removes from the outcome any variability that the factors can explain. We will discuss the relation between this and the Wasserstein barycenter problem, a procedure to solve it in data-driven scenarios, and a number of applications, including conditional density estimation and simulation, model-free Bayesian inference, factor discovery, and an alternative formulation of the regular [Monge] optimal transport problem.
Thursday, June 26, 11:45 – 12:30: Quentin Mérigot
Title: Convergence of algorithms for the uniform quantization problem
Abstract: TBD
Thursday, June 26, 14:30 – 15:15: Guillaume Carlier
Title: Quantitative stability of push-forwards by optimal maps
Abstract: In this talk, based on a joint work with Quentin Mérigot and Alex Delalande, I will discuss (sharp) quantitative stability of push-forwards by optimal transport maps. The main step of the proof relies on a new bound (of independent interest) that quantifies the size of the singular set of convex functions.
Thursday, June 26, 15:15 – 16:00: Nizar Touzi
Title: Particle system approximation of Nash equilibria of symmetric monotone games
Abstract: TBD
Thursday, June 26, 16:30 – 17:00: Sibylle Marcotte
Title: Conservation laws for gradient flows
Abstract: Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This “implicit bias” is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. In this work, we expose the definition and properties of “conservation laws”, that define quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of independent conservation laws via Lie algebra computations. This procedure recovers the conservation laws already known for linear and ReLU neural networks for Euclidean gradient flows, and prove that there are no other laws. We identify new laws for certain flows with momentum and/or non-Euclidean geometries. Joint work with Gabriel Peyré and Rémi Gribonval. Associated papers: https://arxiv.org/abs/2307.00144 https://arxiv.org/abs/2405.12888
Thursday, June 26, 17:00 – 17:30: Mengjian Hua
Title: An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control
Abstract: We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schrödinger-Föllmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.