Monte Carlo Online Seminar

Spring 2025

Spring 2025 YouTube playlist

Tuesday, January 7, 2025
- Speaker: Yuansi Chen (ETH Zurich) [Zoom Link]
- Title: Regularized Dikin Walks for Sampling Truncated Logconcave Measures, Mixed Isoperimetry and Beyond Worst-Case Analysis
- Abstract: We study the problem of drawing samples from a logconcave distribution truncated on a polytope, motivated by computational challenges in Bayesian statistical models with indicator variables, such as probit regression. Building on interior point methods and the Dikin walk for sampling from uniform distributions, we analyze the mixing time of regularized Dikin walks. For a logconcave and log-smooth distribution with condition number $\kappa$, truncated on a polytope in $R^n$ defined with $m$ linear constraints, we prove that the soft-threshold Dikin walk mixes in $O((m+\kappa)n)$ iterations from a warm initialization. It improves upon prior work which required the polytope to be bounded and involved a bound dependent on the radius of the bounded region. Going beyond worst-case mixing time analysis, we demonstrate that the soft-threshold Dikin walk can mix significantly faster when only a limited number of constraints intersect the high-probability mass of the distribution, improving the $O((m+\kappa)n)$ upper bound to $O(m + \kappa n)$.
- Links: Paper Slides YouTube

Tuesday, January 14, 2025
- Speaker: Faming Liang (Purdue) [Zoom Link]
- Title: Extended Fiducial Inference: Toward an Automated Process of Statistical Inference
- Abstract: While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set --`inferring the uncertainty of model parameters on the basis of observations' -- has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference while remaining scalable for big data. EFI involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. EFI also provides an innovative framework for semi-supervised learning. This talk is based on joint work with Drs. Sehwan Kim and Yan Sun.
- Links: Slides

Tuesday, January 21, 2025
- Speaker: Kuikui Liu (MIT) [Zoom Link]
- Title: On Spectral Independence and Applications
- Abstract: In this talk, I will present a recent technique for analyzing mixing times of Markov chains called "spectral independence", which has been used to break long-standing barriers and resolve several decades-old open problems on sampling discrete combinatorial structures arising in statistical physics and theoretical computer science. This technique has in addition led to new connections between MCMC and other fields such as the geometry of polynomials, the emerging study of high-dimensional expanders, and more. As a case study, we will discuss a near-resolution of a longstanding conjecture due to Jerrum regarding rapid mixing of Glauber dynamics on proper colorings of a graph. Our results apply much more broadly to general Markov random fields, where we formally connect the performance of MCMC with the analytic properties of a popular message-passing algorithm known as belief propagation.
- Links: YouTube

Tuesday, January 28, 2025
- Speaker: Matti Vihola (University of Jyväskylä) [Zoom Link]
- Title: An invitation to adaptive MCMC convergence theory
- Abstract: Adaptive Markov chain Monte Carlo (MCMC) algorithms, which automatically tune their parameters based on past samples, have proved extremely useful in practice. There is a lot of freedom how adaptive MCMC samplers can be designed, but some care is needed to ensure that they remain valid (i.e. the Monte Carlo averages are consistent). The self-tuning mechanism makes the algorithms 'non-Markovian', which means that the validity cannot be ensured by standard Markov chains theory. We discuss conditions for adaptation that ensure the consistency of averages, based on a martingale decomposition due to Andrieu and Moulines (Ann. Appl. Probab. 2006). The conditions accomodate both the stochastic gradient type adaptations and the more recent 'adapt increasingly rarely' approach. While the talk is theoretical, the aim is to keep the discussion minimally technical and instead focus on the insight about the requirements which ensure the validity of adaptive MCMC.
- Links: YouTube, Slides

Tuesday, Febuary 4, 2025
- Speaker: Haotian Jiang (University of Chicago) [Zoom Link]
- Title: Quasi-Monte Carlo Integration via Algorithmic Discrepancy Theory
- Abstract: A classical approach to numerically integrating a function f is Monte Carlo (MC) methods. Here, one evaluates f at random points and the estimation error scales as \sigma(f)/n^{1/2} with n samples, where \sigma(f) is the standard deviation of f. A different approach, widely used in practice, is using quasi-Monte Carlo (QMC) methods, where f is evaluated at carefully chosen deterministic points and the error scales roughly as 1/n. Both methods have distinctive advantages and shortcomings, and a key question has been to find a method that combines the advantages of both. In this talk, I will introduce the fascinating area of QMC methods and their connections to various areas of mathematics and to geometric discrepancy. I will then show how recent developments in algorithmic discrepancy theory can be used to give a method that combines the benefits of MC and QMC methods, and even improves upon previous QMC approaches in various ways. The talk is based on joint work with Nikhil Bansal (University of Michigan).
- Links: YouTube Slides

Tuesday, February 11, 2025
- Speaker: Nicolas Chopin (ENSAE, Institut Polytechnique de Paris) [Zoom Link]
- Title: A connection between Tempering SMC and Entropic Mirror Descent
- Abstract: In this talk, I will discuss the connections between tempering (for Sequential Monte Carlo; SMC) and entropic mirror descent to sample from a target probability distribution whose unnormalized density pi is known. My co-authors and myself have established that tempering SMC corresponds to entropic mirror descent applied to the reverse Kullback-Leibler (KL) divergence and obtain convergence rates for the tempering iterates. Our result motivates the tempering iterates from an optimization point of view, showing that tempering can be seen as a descent scheme of the KL divergence with respect to the Fisher-Rao geometry, in contrast to Langevin dynamics that perform descent of the KL with respect to the Wasserstein-2 geometry. We exploit the connection between tempering and mirror descent iterates to justify common practices in SMC and derive adaptive tempering rules that improve over other alternative benchmarks in the literature.
- Links: Youtube Slides

Tuesday, February 18, 2025
- Speaker: Sam Power (University of Bristol) [Zoom Link]
- Title: Convergence Bounds for the Random Walk Metropolis Algorithm - Perspectives from Isoperimetry
- Abstract: The Random Walk Metropolis (RWM) is a simple and enduring Markov chain-based algorithm for approximate simulation from an intractable ‘target’ probability distribution. In a pair of recent works, we have undertaken a detailed study of the quantitative convergence of this algorithm to its equilibrium distribution, establishing non-asymptotic estimates on mixing times, with explicit dependence on dimension and other relevant problem parameters. The results hold at a reasonable level of generality, and are often sharp in a suitable sense.

The focus of the talk will be conceptual rather than technical, with an eye towards enabling intuition for i) which high-level aspects of the target distribution influence the convergence behaviour of RWM, and ii) which concrete properties must be verified in order to obtain a rigorous proof. A key element will be the impact of tail behaviour and measure concentration on the convergence profile of the algorithm across different time-scales.

No prior knowledge of the RWM is required from the audience.

(joint work with C. Andrieu, A. Lee and A. Wang)

- Links: Youtube Slides

Tuesday, February 25, 2025
- Speaker: Galin Jones (University of Minnesota) [Zoom Link]
- Title: Reliable Metropolis-Hastings Simulations in the High-Dimensional, Large Sample Size Regime
- Abstract: Monte Carlo experiments produce samples to estimate features of a given distribution. However, simultaneous estimation of means and quantiles has received little attention, despite being common practice. I will describe a multivariate central limit theorem for any finite combination of sample means and quantiles under the assumption of a strongly mixing process and provide a fast algorithm for constructing hyper-rectangular confidence regions having the desired simultaneous coverage probability and a convenient marginal interpretation. A key condition is that the Markov chain enjoys fast convergence, both theoretically and practically. Lower bounds on the convergence rate of Metropolis-Hastings and other accept-reject-based algorithms are developed in both total variation and Wasserstein distances to identify how the simulations will fail so these settings can be avoided, guiding tuning. Particular attention is paid to using the lower bounds to study the convergence complexity of accept-reject-based Markov chains and to constrain the rate of convergence for geometrically ergodic Markov chains in the high-dimensional, large sample size regime.
- Links: Youtube Slides

Tuesday, March 4, 2025
- Speaker: Max Welling (University of Amsterdam and Microsoft Research) [Zoom Link]
- Title: A Variational Bayesian Method for Autoregressive and Recurrent Models
- Abstract: Autoregressive models are the key technology behind most of today's AI revolution: from language models such as ChatGPT, molecular generative models such as LSTM, to PDE emulator models based on autoregressive graph neural networks. While these models work incredibly well, they lack the ability to assess the confidence in these predictions, especially when they are asked to predict outside of the training regime. As such we need to urgently develop models that have accurate uncertainty quantification built in. In this work we propose BARNN, a variational Bayesian method for Autoregressive and Recurrent Neural Networks. We apply the idea of local reparameterization (or variable dropout) to sequence models and choose a prior that generalizes the "VampPrior" to sequence models. Together this results in a scalable approximate Bayesian method that achieves SOTA on both prediction accuracy and uncertainty prediction. We test the method on a number of tasks in AI4Science such as PDE surrogate models and molecule generation.
- Links: Youtube Slides

Tuesday, March 11, 2025
- Speaker: Bob Carpenter (Flatiron Institute) [Zoom Link]
- Title: How does Stan work?
- Abstract: In this talk, I'll describe the Stan probabilistic programming language and statistical inference stack in technical detail. I will start with an overview of Stan's language from a user's perspective with examples of use. Then I'll drill down into how the programming language works at a technical level, focusing on reverse-mode automatic differentiation, transforms for constrained parameters, and posterior predictive quantities. Next, I'll describe how the differentiable log density defined by a Stan program can be used for Bayesian inference. I'll describe the details of Stan's multinomial no-U-turn sampler (NUTS) and its adaptation, as well as those of its variational inference (Pathfinder and ADVI), and Laplace approximation. I'll conclude by covering some of the packages that depend on Stan in a fundamental way, such as BridgeStan, brms, and Prophet, which collectively have far more users than Stan itself.
- Links: Youtube Slides

Tuesday, March 18, 2025
- Speaker: Sinho Chewi (Yale University) [Zoom Link]
- Title: A local error framework in KL divergence via shifted composition
- Abstract: Local error analysis is a standard framework for establishing error estimates for the numerical discretization of stochastic systems. However, it is traditionally limited to guarantees in the Wasserstein metric. In this talk, I will describe a strengthening of this framework which yields bounds in the stronger sense of KL divergence or relative entropy. At the heart of this result is a technique to use coupling arguments to control information-theoretic divergences. This technique, which we call "shifted composition", builds on a recent line of work developed with my co-author Jason M. Altschuler.
- Links: Youtube Slides

Tuesday, March 25, 2025
- Speaker: Saifuddin Syed (University of Oxford) [Zoom Link]
- Title: Scalable sampling of multi-modal distributions using sequential Monte Carlo samplers
- Abstract: Sampling from complex, unnormalised probability distributions is a fundamental challenge in statistical inference. Since direct sampling is intractable, practitioners typically introduce a reference distribution (e.g., Gaussian) and an annealing path interpolating between the tractable reference and the intractable target. Sequential Monte Carlo Samplers (SMCS) generate weighted particle approximations along these paths by propagating reference samples and applying sequential importance sampling. While SMCS can achieve state-of-the-art performance, it requires careful tuning and remains theoretically challenging to analyse. By analysing the normalising constant estimator's variance, we demonstrate how performance scales with computational resources relative to reference-target discrepancy. We identify a critical phenomenon explaining why SMCS can efficiently tackle modern sampling problems. We present a black-box tuning algorithm, practical implementation guidelines, and comparisons with parallel tempering, illuminating when each approach is optimal depending on the practitioner's computational framework.
- Links: Youtube Slides Paper

Tuesday, April 1, 2025
- Speaker: Yuchen Wu (University of Pennsylvania) [Zoom Link]
- Title: Theoretical advances in diffusion models
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: In this talk, I will present recent theoretical advances in diffusion models, a class of deep generative models driving many cutting-edge applications. In the first part, I will introduce a training-free acceleration method for diffusion models. Our approach is simple to implement, compatible with any pre-trained diffusion model, and comes with a convergence rate that strengthens prior theoretical results. We demonstrate the effectiveness of our algorithm across multiple real-world image generation tasks. In the second part, I will discuss a new class of sampling algorithms designed based on the structure of diffusion models. Our approach replaces score networks in the diffusion model architecture with more efficient denoising algorithms that encode information about the target distribution. As applications, we use our method for posterior sampling in two high-dimensional statistical problems: sparse regression and low-rank matrix estimation within the spiked model. In both cases, we develop algorithms with accuracy guarantees in the regime of constant signal-to-noise ratios.
- Links: Youtube Slides

Tuesday, April 8, 2025
- Speaker: Chenyang Zhong (Columbia University) [Zoom Link]
- Title: A hit and run approach for sampling and analyzing ranking models
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: The analysis of ranking data has gained much recent interest across various applications, including recommender systems, market research, and electoral studies. This talk focuses on the Mallows permutation model, a probabilistic model for ranking data introduced by C. L. Mallows. The Mallows model specifies a family of non-uniform probability distributions on permutations and is characterized by a distance metric on permutations. We focus on two popular choices: the L1 (Spearman’s footrule) and L2(Spearman’s rank correlation) distances. Despite their widespread use in statistics and machine learning, Mallows models with these metrics present significant computational challenges due to the intractability of their normalizing constants. Hit and run algorithms form a broad class of MCMC algorithms, including Swendsen-Wang and data augmentation. In this talk, I will first explain how to sample from Mallows models using hit and run algorithms. For both models, we establish O(log(n)) mixing time upper bounds, which provide the first theoretical guarantees for efficient sampling and enable computationally feasible Monte Carlo maximum likelihood estimation. Then, I will also discuss how the hit and run algorithms can be utilized to prove theorems about probabilistic properties of the Mallows models.
- Links: YouTube Slides

Tuesday, April 15, 2025
- Speaker: Dootika Vats (Indian Institute of Technology, Kanpur) [Zoom Link]
- Title: MCMC Importance Sampling via Moreau-Yosida Envelopes
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: Markov chain Monte Carlo (MCMC) is the workhorse computational algorithm employed for inference in Bayesian statistics. Gradient-based MCMC algorithms are known to yield faster converging Markov chains. In modern parsimonious models, the use of non-differentiable priors is fairly standard, yielding non-differentiable posteriors. Without differentiability, gradient-based MCMC algorithms cannot be employed effectively. Recently proposed proximal MCMC approaches, however, can partially remedy this limitation. These approaches employ the Moreau-Yosida (MY) envelope to smooth the nondifferentiable prior enabling sampling from an approximation to the target posterior. In this work, we leverage properties of the MY envelope to construct an importance sampling paradigm to correct for this approximation error. We establish asymptotic normality of the importance sampling estimators with an explicit expression for the asymptotic variance which we use to derive a practical metric of sampling efficiency. Numerical studies show that the proposed scheme can yield lower variance estimators compared to existing proximal MCMC alternatives.
- Links: YouTube Slides

Tuesday, April 22, 2025
- Speaker: Qiang Liu (UT Austin) [Zoom Link]
- Title: Rectified flow: A straight approach to generative modeling and transport mapping
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: Rectified Flow (RF) is a simple yet general approach to generative and transfer modeling, widely applied in state-of-the-art AI tasks such as image, video, audio, and molecule generation. It provides a straightforward method for learning continuous-time transport mappings between two distributions—observed through either unpaired or paired data points—by learning neural ordinary differential equation (ODE) models that prioritizes path straightness. Straight paths are naturally preferred and allow for fast simulation with large discretization step sizes, enabling efficient one-step or few-step models. This hence yields a notion of straight transport, which differs from the classical notion of optimal transport. Although based solely on ODEs, RF can be extended to offer simplified perspectives on existing diffusion models. See more: https://rectifiedflow.github.io/ .
- Links: YouTube Tutorial

Tuesday, April 29, 2025
- Speaker: Yuexi Wang (UIUC) [Zoom Link]
- Title: Optimal Transport-Based Generative Models for Bayesian Posterior Sampling
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: We investigate the problem of sampling from posterior distributions with intractable normalizing constants in Bayesian inference. Our solution is a new generative modeling approach based on optimal transport (OT) that learns a deterministic map from a reference distribution to the target posterior through constrained optimization. The method uses structural constraints from OT theory to ensure uniqueness of the solution and allows efficient generation of many independent, high-quality posterior samples. The framework supports both continuous and mixed discrete-continuous parameter spaces, with specific adaptations for latent variable models and near-Gaussian posteriors. Beyond computational benefits, it also enables new inferential tools based on OT-derived multivariate ranks and quantiles for Bayesian exploratory analysis and visualization. We demonstrate the effectiveness of our approach through multiple simulation studies and a real-world data analysis.
- Links: YouTube Slides

Tuesday, May 6, 2025
- Speaker: Quan Zhou (Texas A&M University) [Zoom Link]
- Title: Sampling complexity of high-dimensional structure learning and other discrete-space statistical problems
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: In this talk, we consider Markov chain Monte Carlo (MCMC) methods for structure learning of high-dimensional directed acyclic graph (DAG) models, a problem known to be very challenging because of the enormous search space and the existence of Markov equivalent DAGs. We show that it is possible to construct both random walk and informed Metropolis-Hastings samplers on the space of equivalence classes with rapid mixing guarantee under some high-dimensional assumptions; in other words, the complexity of Bayesian learning of sparse equivalence classes grows only polynomially in n and p. We extend these results to general high-dimensional Bayesian models with discrete parameter spaces, and we compare two techniques for obtaining the mixing time bound: the multicommodity flow method and single-element drift condition analysis. Further generalizations will be discussed, including using restricted spectral gap to derive the mixing time guarantee on unrestricted parameter spaces.
- Links: YouTube Slides

Tuesday, May 13, 2025
- Speaker: Ning Ning (Texas A&M University) [Zoom Link]
- Title: Convergence of Dirichlet forms for MCMC optimal scaling with dependent target distributions on large graphs
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm, as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changing Hilbert spaces. Through the optimal scaling problem, we demonstrate the impressive strengths of the Dirichlet form approach over the standard diffusion approach.
- Links: YouTube, Slides

Tuesday, May 20, 2025
- Speaker: Vivekananda Roy (Iowa State University) [Zoom Link]
- Title: A Riemannian manifold approach to informed MCMC sampling
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: A Riemannian geometric framework for Markov chain Monte Carlo (MCMC) is developed where using the Fisher-Rao metric on the manifold of probability density functions (pdfs) informed proposal densities for Metropolis-Hastings (MH) algorithms are constructed. We exploit the square-root representation of pdfs under which the Fisher-Rao metric boils down to the standard L2 metric, resulting in a straightforward implementation of the proposed geometric MCMC methodology. Unlike the random walk MH that blindly proposes a candidate state using no information about the target, the geometric MH algorithms effectively move an uninformed base density (e.g., a random walk proposal density) towards different global/local approximations of the target density. The superiority of the geometric MH algorithm over other MCMC schemes is demonstrated using various multimodal, nonlinear, and high-dimensional examples. A publicly available R package geommc implements the proposed MCMC algorithms.
- Links: YouTube Slides
Tuesday, May 27, 2025
- Speaker: Alain Oliviero-Durmus (Ecole Polytechnique) [Zoom Link]
- Title: Toward Principled Inference and Convergence Guarantees in Diffusion Models
- Time: 8:30 am PT = 11:30 am ET = 4:30 pm London = 5:30 pm Paris = 11:30 pm Beijing.
- Abstract: My talk will be based on two contributions on diffusion models and their application to Bayesian inference.

In the first part of this talk, I will present new KL convergence guarantees under minimal assumptions for both for continuous and discrete score-based diffusion models. Specifically, I focus on discretizations derived from the Ornstein-Uhlenbeck semigroup and its kinetic variant, and show that sharp and explicit KL bounds can be obtained for any data distribution with finite Fisher information—thereby avoiding early stopping, smoothing, or strong regularity conditions.

In the second part, I will shift to the use of diffusion models for solving inverse problems, such as image reconstruction or source separation. Here, I introduce a novel mixture-based posterior sampling framework that combines diffusion priors with observational data using a principled Gibbs sampling scheme. This approach offers theoretical guarantees, task-agnostic applicability, and robust performance across a wide range of problems—including ten image restoration tasks and musical source separation—without relying on crude approximations or heavy heuristic tuning.

Links: YouTube