Spring 2026
Tuesday, January 13, 2026
Title: Stability, instability, and extension of variational inference
Abstract: Variational inference (VI) is a popular alternative to Markov chain Monte Carlo (MCMC) for approximating high-dimensional target distributions. At its core, VI approximates a high-dimensional target distribution—typically specified via an unnormalized density—by a simpler variational family. Despite its empirical successes, the theoretical properties of variational inference have only begun to be understood recently. In this talk, I will discuss recent developments in the theory of variational inference from an optimal transport perspective. In the first part, I will present our recent results on the stability and instability of mean-field variational inference (MFVI). Our main insight is simple: when the target distribution is strongly log-concave, MFVI is quantitatively stable under perturbations of the target, whereas even for simple non–log-concave targets such as a mixture of two Gaussians, MFVI provably suffers from mode collapse. The consequencs of our results are discussed, including guarantees for robust Bayesian inference and a quantitative Bernstein–von Mises theorem. In the second part of the talk, I will present our work on the statistical and computational theory for a class of structured variational inference where the variational family consists of all star-shaped distributions. We establish quantitative approximation guarantees and provide a polynomial-time algorithm for solving the VI problem when the target distribution is strongly log-concave. We also discuss concrete examples, including generalized linear models with Gaussian likelihoods. We also discuss concrete examples including generalized linear models with Gaussian likelihoods. This talk is based on joint work with Shunan Sheng, Alberto González-Sanz, Marcel Nutz, Sinho Chewi, Binghe Zhu, and Aram Pooladian.
Tuesday, January 20, 2026
Speaker: Jiequn Han (Flatiron Institute) [Zoom Link]
Title: Driftlite: Lightweight drift control for inference-time scaling of diffusion models
Abstract: We study inference-time scaling for diffusion models, where a pre-trained model is adapted to new target distributions without retraining. While guidance-based methods are simple but biased, particle-based approaches such as Sequential Monte Carlo often suffer from weight degeneracy and high computational cost. We introduce DriftLite, a lightweight, training-free particle-based method that steers inference dynamics on the fly with provably optimal stability control. DriftLite exploits a previously unexplored degree of freedom in the Fokker–Planck equation between the drift and the particle potential, leading to two practical schemes, Variance- and Energy-Controlling Guidance (VCG/ECG), which approximate the optimal drift with minimal overhead. Across Gaussian mixture models, interacting particle systems, and large-scale protein–ligand co-folding problems, DriftLite consistently reduces variance and improves sample quality compared to pure guidance and Sequential Monte Carlo baselines.
If time permits, I will also briefly introduce self-consistent stochastic interpolants, which enable generative modeling from indirect, noisy observations and substantially extend applicability to many scientific and engineering problems where clean data are unavailable.
Tuesday, January 27, 2026
Speaker: Manon Michel (CNRS, Université Clermont-Auvergne) [Zoom Link]
Title: Harnessing Newtonian dynamics in generative models
Abstract: Generative modeling seeks to learn and sample from complex, high-dimensional data distributions and plays a key role in machine learning, Bayesian inference, and computational physics. One powerful class of methods, called Normalizing Flows, works by gradually transforming simple random noise into complex data using reversible steps. While these models are reliable and mathematically well-understood, they can become slow and expensive to use when dealing with high dimensions. In this talk, I will explain how ideas from classical physics, in particular, the laws that govern motion, can be used to build more efficient and intuitive generative models. By designing these models to follow classical dynamics and using neural networks only where they are most helpful, we can further reduce computational costs while making the models more stable and easier to interpret.
Tuesday, February 3, 2026
Speaker: Ethan N. Epperly (UC Berkeley) [Zoom Link]
Title: What is the role of Monte Carlo in randomized linear algebra?
Abstract: The Monte Carlo and computational mathematics research communities have traditionally not had a lot of overlap. "Our experience suggests that many practitioners of scientific computing view randomized algorithms as a desperate and final resort," writes one influential paper. Yet the advent of the field of randomized linear algebra has created new opportunities for dialog between Monte Carlo researchers and computational mathematicians. This talk will provide an overview of ways Monte Carlo methodologies have been used to improve randomized linear algebraic algorithms, with a focus on the speaker's research. The talk will identify three distinct roles for Monte Carlo in randomized linear algebra: solving problems of intractable scale, reducing variance, and solving challenging sampling problems
Tuesday, February 10, 2026
Speaker: Aaron Smith (University of Ottawa) [Zoom Link]
Title: Some Speedups From Fast Convergence of Important Test Functions
Abstract: From classical spectral analysis of Markov chains on finite state spaces, we know that some non-Markovian functions f(X_{t}) can converge much more quickly (or in much stronger norm) than the full Markov chain X_{t}. Several authors have used this to speed up MCMC, obtaining large improvements in the "simple" case of multimodality induced by label-switching or more subtle improvements for statistically-relevant test functions (see Rabinovitch et al (2016)). Unfortunately, it is often difficult to get bounds that are both interpretable and offer large speedups. In this talk, I'll discuss some recent work in which we obtain quadratic advantages for two very different reasons in two very different settings: approximate MCMC for graphical models, and exact MCMC for all "relevant" functions of a random rotation matrix. While this work is largely mathematical, the talk will focus on examples and open questions over detailed proofs, with a collection of simple worked examples from the literature on function-specific mixing times and applications.
Based on joint work with Vishesh Jain, Na Lin, Yuanyuan Liu, Natesh Pillai, Ashwin Sah, Mehtaab Sawhney, and Vinod Vaikuntanathan.
Tuesday, February 17, 2026
Speaker: Aki Nishimura (Johns Hopkins University) [Zoom Link]
Title: Zigzag path connects two Monte Carlo paradigms: Hamiltonian counterparts to piecewise deterministic Markov processes
Abstract: Zigzag and other piecewise deterministic Markov process samplers have attracted significant interest for their non-reversibility and other appealing properties for Bayesian computation. Hamiltonian Monte Carlo is another state-of-the-art sampler, exploiting fictitious momentum to guide Markov chains through complex target distributions.
In this talk, we first establish a remarkable connection between the zigzag sampler and a variant of Hamiltonian Monte Carlo based on Laplace-distributed momentum. The position-velocity component of the corresponding Hamiltonian dynamics travels along a zigzag path paralleling the Markovian zigzag process; however, the dynamics is non-Markovian as the momentum component encodes non-immediate pasts. In the limit of increasingly frequent momentum refreshments in which we preserve its direction but re-sample magnitude, we prove that Hamiltonian zigzag converges strongly to its Markovian counterpart. This theoretical insight in particular explains the two zigzags' relative performance on target distributions with highly correlated parameters, which we demonstrate on a 11,235-dimensional truncated Gaussian target arising from Bayesian phylogenetic multivariate probit model applied to an HIV virus dataset. We then proceed to construct a Hamiltonian counterpart to the bouncy particle sampler (BPS), further strengthening the connection between the two paradigms. We achieve this by turning BPS's Poisson schedule for velocity switch events into a deterministic one dictated by an auxiliary "inertia" parameter. The resulting Hamiltonian BPS constitutes an efficient sampler on log-concave targets and straightforwardly accommodates parameter constraints. We demonstrate its competitive performance in the posterior computation under Bayesian sparse logistic regression model applied to a large-scale observational study consisting of 72,489 patients and 22,175 clinical covariates.
Tuesday, February 24, 2026
Speaker: Angelos Alexopoulos (Athens University of Economics and Business) [Zoom Link]
Title: Gaussian invariant Markov chain Monte Carlo
Abstract: We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
Tuesday, March 3, 2026
Speaker: Pierre Del Moral (INRIA Bordeaux) [Zoom Link]
Title: On the Kantorovich contraction of Markov semigroups
Abstract: We present a novel operator theoretic framework to study the contraction properties of Markov semigroups with respect to a general class of Kantorovich semi-distances, which notably includes Wasserstein distances. This rather simple contraction cost framework combines standard Lyapunov techniques with local contraction conditions. Our results can be applied to both discrete time and continuous time Markov semigroups, and we illustrate their wide applicability in the context of (i) Markov transitions on models with boundary states, including bounded domains with entrance boundaries, (ii) operator products of a Markov kernel and its adjoint, including two-block-type Gibbs samplers, (iii) iterated random functions and (iv) diffusion models, including overdampted Langevin diffusion with convex at infinity potentials. Joint work with M. Gerber (Bristol Univ.)
Tuesday, March 10, 2026
Speaker: Luhuan Wu (Flatiron, Johns Hopkins University) [Zoom Link]
Title: Reverse Diffusion Sequential Monte Carlo Samplers
Abstract: Diffusion models have emerged as a powerful paradigm for generative modeling. In this talk, we explore their use as annealing paths for sampling from unnormalized target distributions. Building on prior work, we first present a unifying framework that leverages Monte Carlo methods to estimate score functions and simulate diffusion-based sampling trajectories. However, such approaches can suffer from accumulated bias due to time discretization and imperfect score estimation.
To address these challenges, we introduce a principled Sequential Monte Carlo (SMC) framework that formalizes diffusion-based samplers as proposal mechanisms while systematically correcting their biases. The key idea is to construct informative intermediate target distributions that progressively guide particles toward the final distribution of interest. Although the ideal targets are intractable, we derive exact approximations using quantities already available from the score-based proposal, requiring no extra inference overhead. The resulting method, Reverse Diffusion Sequential Monte Carlo, enables consistent sampling and unbiased estimation of the target normalization constant. We demonstrate our method on a range of synthetic targets and Bayesian regression tasks.
Links: Paper
Tuesday, March 17, 2026
Speaker: Anna Korba (ENSAE/ CREST) [Zoom Link]
Title: Variational Inference with Mixtures of Isotropic Gaussians
Abstract: Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments. This is a joint work with Marguerite Petit-Talamon and Marc Lambert, that was presented at Neurips 2025.
Tuesday, March 24, 2026
Speaker: Giacomo Zanella (Bocconi University) [Zoom Link]
Title: Error Bounds and Optimal Schedules for Masked Diffusion models
Time: [ 8:30 am PT ] = [ 11:30 pm ET ] = [ 3:30 pm London ] = [ 4:30 pm Paris ] = [ 11:30 pm Beijing]
Abstract: Masked Diffusion Models are popular generative models for discrete data, which exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models. We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy) that depend only on the average number of tokens generated per iteration and are independent of the data dimensionality (i.e. sequence length). We then investigate the gains obtained by using non-constant schedule sizes and identify the optimal schedule as a function of the so-called information profile of the data distribution. The talk is based on joint work with Hugo Lavenant, available at https://arxiv.org/abs/2510.25544.
Links: Paper