Neural networks continue to surprise us with their remarkable capabilities for high-dimensional function approximation. Applications of machine learning now pervade essentially every scientific discipline, but predictive models to describe the optimization dynamics, inference properties, and flexibility of modern neural networks remain limited. In this talk, I will introduce several approaches to both analyzing and building generative models to augment Monte Carlo sampling and sampling high-dimensional distributions. I will focus, in particular, on two applications from chemistry: sampling conformational ensembles of disordered protein domains and molecular optimization. I will also introduce a self-distillation strategy for large scale models that shares conceptually similarities to preference optimization with reinforcement learning, but does not require proximal optimization (PPO) and outperforms direct preference optimization and (DPO).
This talk is the first in a two-part series on sampling from unnormalized densities and free energy estimation. This part focuses on variance reduction in annealed importance sampling (AIS) through the use of approximate SDE transports. We begin by revisiting AIS and its theoretical connections to the Jarzynski equality and related fluctuation theorems. Building on this foundation, we introduce a general importance sampling identity that enables unbiased estimation in AIS when combined with approximate transports. We characterize the class of optimal transports that minimize estimator variance and show how this framework both subsumes and formalizes the escorted Jarzynski method. In doing so, we provide one of the first formal proofs of its correctness and offer a generalization of Crooks’ fluctuation theorem. This perspective points to new design directions in sampling and inference by also connecting to time reversal and generative modelling, opening the door to many future avenues for the design of neural samplers.
I will describe another perspective on augmenting annealed Langevin dynamics with non-equilibrium transport, complementary to Francisco’s discussion. I will show how the Jarzynski relation can be derived through simple manipulation of the Fokker-Planck equation associated to the Langevin diffusion. I will describe how these importance weights can be extended to include a non-equilibrium drift, and that this drift can be learned through minimization of physics-informed neural network (PINN) loss to minimize the variance of these importance weights. I will suggest how the approach can be extended to the case of a discrete random variable, but you should see Peter’s talk for that.
In this talk, I give some vignettes demonstrating how mixture models can be used as a fruitful abstraction for understanding various algorithmic and empirical aspects of diffusions. In the first part, I describe a new algorithm for learning Gaussian mixture models via score estimation that is exponentially faster in a relevant range of parameters than prior work based on the method of moments. In the second part, I explain how to use mixture models to understand “critical windows” in diffusions (and localization-based samplers more generally), i.e. narrow windows in the generation process during which important features of the final output are determined. Time permitting, in the third part, I describe a simple toy mixture model in which one can precisely characterize the behavior of guidance. Based on the following joint works: https://arxiv.org/abs/2404.18893, https://arxiv.org/abs/2502.00921, https://arxiv.org/abs/2409.13074.
Generative models parameterize flexible families of distributions capable of fitting complex datasets such as images or text. These models can generate independent samples from intricate high-dimensional distributions at a negligible cost. In contrast, sampling exactly from a given target distribution—such as the Boltzmann distribution of a physical system—is often a major challenge due to high dimensionality, multimodality, ill-conditioning, or a combination of these factors. This raises the question: How can generative models be leveraged to assist in the sampling task? A key difficulty in this setting is the lack of an extensive dataset to learn from upfront. In this talk, I will focus in particular on sampling from multimodal distributions and present recent attempts inspired by diffusion models to sample using a denoising process. The talk is mainly based on works with Louis Grenioux, Maxence Noble and Alain Oliviero Durmus.