Spring 2026
Tuesday, January 13, 2026
Title: Stability, instability, and extension of variational inference
Abstract: Variational inference (VI) is a popular alternative to Markov chain Monte Carlo (MCMC) for approximating high-dimensional target distributions. At its core, VI approximates a high-dimensional target distribution—typically specified via an unnormalized density—by a simpler variational family. Despite its empirical successes, the theoretical properties of variational inference have only begun to be understood recently. In this talk, I will discuss recent developments in the theory of variational inference from an optimal transport perspective. In the first part, I will present our recent results on the stability and instability of mean-field variational inference (MFVI). Our main insight is simple: when the target distribution is strongly log-concave, MFVI is quantitatively stable under perturbations of the target, whereas even for simple non–log-concave targets such as a mixture of two Gaussians, MFVI provably suffers from mode collapse. The consequencs of our results are discussed, including guarantees for robust Bayesian inference and a quantitative Bernstein–von Mises theorem. In the second part of the talk, I will present our work on the statistical and computational theory for a class of structured variational inference where the variational family consists of all star-shaped distributions. We establish quantitative approximation guarantees and provide a polynomial-time algorithm for solving the VI problem when the target distribution is strongly log-concave. We also discuss concrete examples, including generalized linear models with Gaussian likelihoods. We also discuss concrete examples including generalized linear models with Gaussian likelihoods. This talk is based on joint work with Shunan Sheng, Alberto González-Sanz, Marcel Nutz, Sinho Chewi, Binghe Zhu, and Aram Pooladian.
Tuesday, January 20, 2026
Speaker: Jiequn Han (Flatiron Institute) [Zoom Link]
Title: Driftlite: Lightweight drift control for inference-time scaling of diffusion models
Abstract: We study inference-time scaling for diffusion models, where a pre-trained model is adapted to new target distributions without retraining. While guidance-based methods are simple but biased, particle-based approaches such as Sequential Monte Carlo often suffer from weight degeneracy and high computational cost. We introduce DriftLite, a lightweight, training-free particle-based method that steers inference dynamics on the fly with provably optimal stability control. DriftLite exploits a previously unexplored degree of freedom in the Fokker–Planck equation between the drift and the particle potential, leading to two practical schemes, Variance- and Energy-Controlling Guidance (VCG/ECG), which approximate the optimal drift with minimal overhead. Across Gaussian mixture models, interacting particle systems, and large-scale protein–ligand co-folding problems, DriftLite consistently reduces variance and improves sample quality compared to pure guidance and Sequential Monte Carlo baselines.
If time permits, I will also briefly introduce self-consistent stochastic interpolants, which enable generative modeling from indirect, noisy observations and substantially extend applicability to many scientific and engineering problems where clean data are unavailable.
Tuesday, January 27, 2026
Speaker: Manon Michel (CNRS, Université Clermont-Auvergne) [Zoom Link]
Title: Harnessing Newtonian dynamics in generative models
Abstract: Generative modeling seeks to learn and sample from complex, high-dimensional data distributions and plays a key role in machine learning, Bayesian inference, and computational physics. One powerful class of methods, called Normalizing Flows, works by gradually transforming simple random noise into complex data using reversible steps. While these models are reliable and mathematically well-understood, they can become slow and expensive to use when dealing with high dimensions. In this talk, I will explain how ideas from classical physics, in particular, the laws that govern motion, can be used to build more efficient and intuitive generative models. By designing these models to follow classical dynamics and using neural networks only where they are most helpful, we can further reduce computational costs while making the models more stable and easier to interpret.
Tuesday, February 3, 2026
Speaker: Ethan N. Epperly (UC Berkeley) [Zoom Link]
Title: What is the role of Monte Carlo in randomized linear algebra?
Abstract: The Monte Carlo and computational mathematics research communities have traditionally not had a lot of overlap. "Our experience suggests that many practitioners of scientific computing view randomized algorithms as a desperate and final resort," writes one influential paper. Yet the advent of the field of randomized linear algebra has created new opportunities for dialog between Monte Carlo researchers and computational mathematicians. This talk will provide an overview of ways Monte Carlo methodologies have been used to improve randomized linear algebraic algorithms, with a focus on the speaker's research. The talk will identify three distinct roles for Monte Carlo in randomized linear algebra: solving problems of intractable scale, reducing variance, and solving challenging sampling problems.
Tuesday, February 10, 2026
Speaker: Aaron Smith (University of Ottawa) [Zoom Link]
Title: Some Speedups From Fast Convergence of Important Test Functions
Abstract: From classical spectral analysis of Markov chains on finite state spaces, we know that some non-Markovian functions f(X_{t}) can converge much more quickly (or in much stronger norm) than the full Markov chain X_{t}. Several authors have used this to speed up MCMC, obtaining large improvements in the "simple" case of multimodality induced by label-switching or more subtle improvements for statistically-relevant test functions (see Rabinovitch et al (2016)). Unfortunately, it is often difficult to get bounds that are both interpretable and offer large speedups. In this talk, I'll discuss some recent work in which we obtain quadratic advantages for two very different reasons in two very different settings: approximate MCMC for graphical models, and exact MCMC for all "relevant" functions of a random rotation matrix. While this work is largely mathematical, the talk will focus on examples and open questions over detailed proofs, with a collection of simple worked examples from the literature on function-specific mixing times and applications.
Based on joint work with Vishesh Jain, Na Lin, Yuanyuan Liu, Natesh Pillai, Ashwin Sah, Mehtaab Sawhney, and Vinod Vaikuntanathan.
Tuesday, February 17, 2026
Speaker: Aki Nishimura (Johns Hopkins University) [Zoom Link]
Title: Zigzag path connects two Monte Carlo paradigms: Hamiltonian counterparts to piecewise deterministic Markov processes
Abstract: Zigzag and other piecewise deterministic Markov process samplers have attracted significant interest for their non-reversibility and other appealing properties for Bayesian computation. Hamiltonian Monte Carlo is another state-of-the-art sampler, exploiting fictitious momentum to guide Markov chains through complex target distributions.
In this talk, we first establish a remarkable connection between the zigzag sampler and a variant of Hamiltonian Monte Carlo based on Laplace-distributed momentum. The position-velocity component of the corresponding Hamiltonian dynamics travels along a zigzag path paralleling the Markovian zigzag process; however, the dynamics is non-Markovian as the momentum component encodes non-immediate pasts. In the limit of increasingly frequent momentum refreshments in which we preserve its direction but re-sample magnitude, we prove that Hamiltonian zigzag converges strongly to its Markovian counterpart. This theoretical insight in particular explains the two zigzags' relative performance on target distributions with highly correlated parameters, which we demonstrate on a 11,235-dimensional truncated Gaussian target arising from Bayesian phylogenetic multivariate probit model applied to an HIV virus dataset. We then proceed to construct a Hamiltonian counterpart to the bouncy particle sampler (BPS), further strengthening the connection between the two paradigms. We achieve this by turning BPS's Poisson schedule for velocity switch events into a deterministic one dictated by an auxiliary "inertia" parameter. The resulting Hamiltonian BPS constitutes an efficient sampler on log-concave targets and straightforwardly accommodates parameter constraints. We demonstrate its competitive performance in the posterior computation under Bayesian sparse logistic regression model applied to a large-scale observational study consisting of 72,489 patients and 22,175 clinical covariates.
Tuesday, February 24, 2026
Speaker: Angelos Alexopoulos (Athens University of Economics and Business) [Zoom Link]
Title: Gaussian invariant Markov chain Monte Carlo
Abstract: We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
Tuesday, March 3, 2026
Speaker: Pierre Del Moral (INRIA Bordeaux) [Zoom Link]
Title: On the Kantorovich contraction of Markov semigroups
Abstract: We present a novel operator theoretic framework to study the contraction properties of Markov semigroups with respect to a general class of Kantorovich semi-distances, which notably includes Wasserstein distances. This rather simple contraction cost framework combines standard Lyapunov techniques with local contraction conditions. Our results can be applied to both discrete time and continuous time Markov semigroups, and we illustrate their wide applicability in the context of (i) Markov transitions on models with boundary states, including bounded domains with entrance boundaries, (ii) operator products of a Markov kernel and its adjoint, including two-block-type Gibbs samplers, (iii) iterated random functions and (iv) diffusion models, including overdampted Langevin diffusion with convex at infinity potentials. Joint work with M. Gerber (Bristol Univ.)
Tuesday, March 10, 2026
Speaker: Luhuan Wu (Flatiron, Johns Hopkins University) [Zoom Link]
Title: Reverse Diffusion Sequential Monte Carlo Samplers
Abstract: Diffusion models have emerged as a powerful paradigm for generative modeling. In this talk, we explore their use as annealing paths for sampling from unnormalized target distributions. Building on prior work, we first present a unifying framework that leverages Monte Carlo methods to estimate score functions and simulate diffusion-based sampling trajectories. However, such approaches can suffer from accumulated bias due to time discretization and imperfect score estimation.
To address these challenges, we introduce a principled Sequential Monte Carlo (SMC) framework that formalizes diffusion-based samplers as proposal mechanisms while systematically correcting their biases. The key idea is to construct informative intermediate target distributions that progressively guide particles toward the final distribution of interest. Although the ideal targets are intractable, we derive exact approximations using quantities already available from the score-based proposal, requiring no extra inference overhead. The resulting method, Reverse Diffusion Sequential Monte Carlo, enables consistent sampling and unbiased estimation of the target normalization constant. We demonstrate our method on a range of synthetic targets and Bayesian regression tasks.
Tuesday, March 17, 2026
Speaker: Anna Korba (ENSAE/ CREST) [Zoom Link]
Title: Variational Inference with Mixtures of Isotropic Gaussians
Abstract: Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments. This is a joint work with Marguerite Petit-Talamon and Marc Lambert, that was presented at NeurIPS 2025.
Tuesday, March 24, 2026
Speaker: Giacomo Zanella (Bocconi University) [Zoom Link]
Title: Error Bounds and Optimal Schedules for Masked Diffusion models
Time: [ 8:30 am PT ] = [ 11:30 pm ET ] = [ 3:30 pm London ] = [ 4:30 pm Paris ] = [ 11:30 pm Beijing]
Abstract: Masked Diffusion Models are popular generative models for discrete data, which exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models. We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy) that depend only on the average number of tokens generated per iteration and are independent of the data dimensionality (i.e. sequence length). We then investigate the gains obtained by using non-constant schedule sizes and identify the optimal schedule as a function of the so-called information profile of the data distribution. The talk is based on joint work with Hugo Lavenant, available at https://arxiv.org/abs/2510.25544.
Tuesday, March 31, 2026
Title: Diffusion Model’s Generalization via Data-Dependent Ridge Manifolds
Abstract: When a diffusion model is not memorizing the training samples, what does it generate, and why? In this talk, I will describe a quantitative framework for understanding the distribution produced by a learned diffusion model through a data-driven geometric object: a log-density ridge manifold of the smoothed training distribution. This manifold acts as a backbone for generation and reveals a three-stage inference behavior: trajectories first reach the ridge, then align in normal directions, and finally slide along tangent directions. This perspective allows us to quantify how training error influences generation in different directions, and to explain when inter-mode generations arise. I will also present a random feature example in which the model’s inductive bias can be decomposed explicitly into architectural bias and optimization error, and tracked along the inference dynamics. Experiments on synthetic multimodal distributions and MNIST latent diffusion support the theory in both low- and high-dimensional settings.
Tuesday, April 7, 2026
Speaker: Yifan Chen (UCLA) [Zoom Link]
Title: Affine Invariant Samplers and Flows: Analysis and New Algorithms
Abstract: The Goodman–Weare affine invariant ensemble sampler is widely used for sampling from complex probability distributions, owing to its simplicity and robustness to ill-conditioning, and is popularized by the “emcee" package. In this talk, I will first characterize its scaling limit as an affine-invariant gradient flow on the space of probability measures. Building on this perspective, I will introduce a family of affine-invariant gradient and Hamiltonian flows that give rise to unbiased ensemble samplers with provably improved high-dimensional scaling compared to Goodman–Weare. For settings demanding further scalability, I will also discuss approximate samplers based on variational inference, driven by affine-invariant gradient flows over Gaussian and mixture families. Theoretical guarantees and empirical results demonstrate robustness to both dimension and condition number, suggesting a broadly applicable affine-invariant framework and promising generic toolkit for sampling in high-dimensional, ill-conditioned problems.
Tuesday, April 14, 2026
Speaker: Bohan Zhou (UCSB) [Zoom Link]
Title: Accelerating MCMC on discrete-state space.
Abstract: Recent years have seen growing interest in the deep connections between optimization and sampling. In particular, Langevin dynamics for sampling can be interpreted as the gradient flow of the relative entropy in the space of probability distributions. A natural question is whether such connections can be extended to discrete state spaces. Building on the new interpretation of MCMC as a gradient flow with respect to the graphical Wasserstein metric, we propose a class of Nesterov-type algorithms to accelerate MCMC sampling on graphs. The corresponding continuous-time formulation can be viewed as a damped Hamiltonian flow in probability space. We establish theoretical results on convergence and acceleration for some user-specified setting, and present numerical examples demonstrating improved accuracy and convergence speed of sampling on multimodal distributions and real datasets.
Tuesday, April 21, 2026
Speaker: Noah Golowich (UT Austin) [Zoom Link]
Title: Understanding Parallel Reasoning in Language Model Inference
Abstract: Efficiently sampling from a complex probability distribution is a fundamental problem across machine learning and theoretical computer science. It has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from large language models (LLMs) have been proposed to solve challenging reasoning problems spanning domains such as mathematics and coding. For the most part, however, we lack a principled understanding of the accuracy--cost tradeoffs for such procedures. In this talk, we propose a formalization for such tasks as the problem of producing a sample from a target probability measure, given an oracle which yields approximate density estimates for the target measure. Depending on the context, this oracle may be interpreted as an approximate verifier or a *process reward model* for a particular language modeling task. This setup is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989).
Generalizing results from existing literature, we establish provable guarantees for the Sequential Monte Carlo algorithm and related particle filtering approaches, which have recently found success empirically in the context of both language modeling and diffusion. In particular, our theory identifies a few properties of the oracle which suffice for efficient sampling. We conduct experiments to show that these properties indeed correlate with sampling performance for certain language modeling tasks.
The efficacy of such sampling algorithms, however, is limited by the relationship between the underlying LLM and the particular sampling task at hand, which has motivated the framework of Test-Time Training (TTT). In particular, TTT updates a model's weights in response to partial generations and reward feedback received at inference time. In the latter half of the talk, we will discuss some provable benefits of TTT in the context of our sampling framework.
Based on https://arxiv.org/pdf/2603.07887 (joint work with Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, and Akshay Krishnamurthy); and upcoming joint work with Ankur Moitra and Dhruv Rohatgi.