Fall 2024
Youtube playlist for fall 2024
Tuesday, October 1, 2024
Speaker: Persi Diaconis (Stanford)
Title: Symmetry analysis using Markov chains
Abstract: Let X be a finite set, Let G be a finite group acting on X. This splits X into orbits and one may ask 'how many orbits are there?, what are their typical sizes? How can one 'choose an orbit at random?'. For example, let X be the set of rooted labeled trees on n vertices. Let X be the symmetric group S_(n-1) acting on X (permutations that fix the root). Everybody knows there are n^(n-2) trees (Cayley) but the number of orbits, unlabeled or Polya trees, have no simple formula and enumerative questions are challenging. I'll introduce the Burnside process, a Markov chain that enables sampling (and from here, counting and understanding orbit sizes). This works in general, for example for contingency tables with fixed row and column sums, or for phylogenetic trees or for generating a random partition of n. This is joint work with Laurent Bartholdi with help from Nathan Tung, Michael Howes and Chenyang Zhong.
Tuesday, October 8, 2024
Speaker: Mike Giles (Oxford)
Title: Optimisation of MLMC on FPGAs
Abstract: FPGAs (Field-Programmable Gate Arrays) are ideally suited to very efficient fixed-point arithmetic with a customised number of bits for each variable. This talk considers their use for computational finance, estimating option process based on the approximation of SDEs, using nested Multilevel Monte Carlo to correct for the rounding errors due to low precision as well as the usual timestep discretisation errors. Another key ingredient is a highly efficient method for generating approximate Normal random variables on the FPGA. The talk covers the optimisation of the overall procedure, in preparation for the future implementation on real FPGAs.
Tuesday, October 15, 2024
Speaker: Art Owen (Stanford)
Title: Practical Quasi-Monte Carlo Integration
Abstract: Quasi-Monte Carlo (QMC) integration is a method for computing multidimensional integrals. It avoids the curse of dimensionality of classical quadrature rules in the same sense that plain Monte Carlo (MC) sampling does. When some randomization is injected into QMC, it is then superior to plain MC in several ways. It has a variance that is o(1/n) asymptotically while not being more than a given multiple of the MC variance at finite n. For favorable integrands, a better convergence rate is seen at practical sample sizes. One could wonder why QMC has not displaced MC. First, there is some modestly increased complexity to use it. Second, while it rarely underperforms plain MC, some skill may be required in order to get dramatic improvements at practical sample sizes. It helps that there are now good software implementations.
Links: Online Book Slides Youtube
Tuesday, October 22, 2024
Speaker: Nawaf Bou-Rabee (Rutgers University)
Title: The No-U-Turn Sampler: Reversibility, Mixing Time, and Local Step-Size Adaptation
Abstract: The No-U-Turn sampler (NUTS) is arguably one of the most important MCMC methods out there, but its recursive architecture has long made it difficult to fully understand. In this talk, I will present a concise proof of NUTS’ reversibility by interpreting it as an auxiliary variable method, in the sense of Section 2.1 of arxiv.org/abs/2404.15253. This novel auxiliary variable approach offers significant flexibility for constructing transition kernels that are reversible with respect to a given target distribution, and includes as special cases Metropolis-Hastings, the slice sampler, and existing locally adaptive HMC methods. I will then present the first mixing time guarantee for NUTS, based on couplings and concentration of measure, which aligns well with empirical evidence. Specifically, the mixing time of NUTS, when initialized in the concentration region of the canonical Gaussian measure, scales as d^{1/4}, up to log factors, where d is the dimension (see arxiv.org/abs/2410.06978) – a result expected to be sharp (see arxiv.org/abs/1001.4460). A key insight of our analysis is that concentration of measure leads to uniformity in NUTS’ locally adapted transitions. We formalize this uniformity by using an extension of a recent coupling framework (see arxiv.org/abs/2308.04634) to a broader class of accept/reject chains. NUTS is then interpreted as one such chain, with its accept chain showing more uniform behavior. Additionally, I’ll discuss a previously unnoticed issue with NUTS’ path length adaptation that we uncovered during our analysis revealing that NUTS’ path length adaptation can be undermined if the leapfrog step size does not meet a no-looping condition. This is joint work with Bob Carpenter (Flatiron), Milo Marsden (Stanford), and Stefan Oberdörster (Bonn).
Tuesday, October 29, 2024
Speaker: Gareth Roberts (Warwick)
Time: Oct 29 9:30 am PT = 12:30 pm ET = 4:30 pm London = 5:30 pm Paris = Oct 30 0:30 am Beijing (+1d)
Title: Diffusive behaviour of non-reversible MCMC with application to Simulated Tempering.
Abstract: Non-reversible Markov chain Monte Carlo algorithms such as Piecewise Deterministic Markov Processes are becoming increasingly popular in Bayesian computation as well as Physics. One of the attractions of these methods is that they provide momentum which facilitates mixing through the state space. This talk will discuss some results which describe the extent to which the advantages of non-reversible MCMC persist in high-dimensional problems. In some cases the momentum can be homogenised in the high-dimensional limit to give diffusive behaviour. This work is applied to a comparison of reversible and non-reversible simulated (and parallel) tempering algorithms. Practical implementational guidance for these algorithms is given by the theory. This is joint work with Jeff Rosenthal.
Tuesday, November 5, 2024
Speaker: Arnaud Doucet (Google DeepMind)
Abstract: Mass transport problems arise in many areas of machine learning whereby one wants to compute a map transporting one distribution to another. Generative modeling techniques like Generative Adversarial Networks (GANs) and Denoising Diffusion Models (DDMs) have been successfully adapted to solve such transport problems, resulting in CycleGAN and Bridge Matching respectively. However, these methods do not approximate Optimal Transport (OT) maps, which are known to have desirable properties. Existing techniques approximating OT maps for high-dimensional data-rich problems, such as DDM-based Rectified Flow and Schrödinger Bridge procedures, require fully training a DDM-type model at each iteration, or use mini-batch techniques which can introduce significant errors. We propose a novel algorithm to compute the Schrödinger Bridge, a dynamic entropy-regularised version of OT, that eliminates the need to train multiple DDM-like models. This
algorithm corresponds to a discretisation of a flow of path measures, which we call the Schrödinger Bridge Flow, whose only stationary point is the Schrödinger Bridge. We demonstrate the performance of our algorithm on a variety of unpaired data translation tasks.
Links: YouTube
Tuesday, November 12, 2024
Speaker: Diana Cai (Flatiron Institute)
Title: Batch and match: score-based approaches for black-box variational inference
Abstract: Probabilistic modeling is a cornerstone of modern data analysis, uncertainty quantification, and decision making. A key challenge of probabilistic inference is computing a target distribution of interest, which is often intractable. Variational inference (VI) is a popular method that has enabled scalable probabilistic inference across a range of applications in science and engineering. Black-box variational inference (BBVI) algorithms, in which the target needs only to be available as the log of an unnormalized distribution that is typically differentiable, are becoming a major focus of VI due to their ease of application. Moreover, thanks to advances in automatic differentiation, BBVI algorithms are now widely available in popular probabilistic programming languages, providing automated VI algorithms to data analysis practitioners. But gradient-based methods can be plagued by high-variance gradients and sensitivity to hyperparameters of the learning algorithms; these issues are further exacerbated when using richer variational families that model the correlations between different latent variables.
We present “batch and match” (BaM), an alternative approach to BBVI that uses a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization. Finally, we discuss extensions of score-based BBVI to high-dimensional settings, large data settings, and non-Gaussian variational families.
Tuesday, November 19, 2024
Speaker: Liyue Shen (University of Michigan)
Title: Enhance Efficiency of Diffusion-based Generative Models for Solving Inverse Problems via Posterior Sampling
Abstract: Diffusion models have recently emerged as powerful generative priors for solving inverse problems. However, training diffusion models is data-intensive and computationally demanding, which restricts their applicability for high-dimensional and high-resolution data such as medical and scientific imaging. In this talk, I will first briefly introduce how the diffusion model is used as the prior for solving general inverse problems through posterior sampling. Then I will share our recent works on how to improve the efficiency (data, time, and memory efficiency) of diffusion-based generative models to tackle these challenges. Particularly, I will introduce two plausible solutions we propose to enable learning diffusion priors through latent diffusion and patch diffusion models. The results are demonstrated in solving various inverse problems for both natural and medical images including 3D medical image reconstruction, opening the door to leverage diffusion-based generative models in tackling complex real-world data for addressing various crucial problems in many scientific disciplines.
Tuesday, December 3, 2024
Speaker: Jun Yang (University of Copenhagen)
Title: Stereographic Markov Chain Monte Carlo
Abstract: High-dimensional distributions, especially those with heavy tails, are notoriously difficult for off-the-shelf MCMC samplers: the combination of unbounded state spaces, diminishing gradient information, and local moves results in empirically observed "stickiness" and poor theoretical mixing properties -- lack of geometric ergodicity. In this talk, we introduce a new class of MCMC samplers that map the original high-dimensional problem in Euclidean space onto a sphere and remedy these notorious mixing problems. In particular, we develop random-walk Metropolis type algorithms as well as versions of the Bouncy Particle Sampler that are uniformly ergodic for a large class of light and heavy-tailed distributions and also empirically exhibit rapid convergence in high dimensions. In the best scenario, the proposed samplers can enjoy the “blessings of dimensionality” that the convergence is faster in higher dimensions. Joint work with Krzysztof Łatuszyński and Gareth O. Roberts.
Links: Slides Youtube Another Talk Slides
Tuesday, December 10, 2024
Speaker: Simon Mak (Duke University)
Title: Recent advances in Bayesian optimization for the physical sciences
Abstract: With advances in scientific computing, computer simulations are increasingly used for investigating complex physical phenomena. For many such applications, scientific decision-making involves optimizing the simulator output, which can be costly given the expensive nature of simulation runs. While Bayesian optimization (BO) offers a promising solution, there are key challenges that limit the use of existing BO methods in the physical sciences. The first is the presence of noise parameters, which are controllable in the simulator but uncontrollable in reality. For this, we propose a new Targeted Variance Reduction (TVR) method, for optimizing a black-box simulator given random uncertainty on noise parameters. Using a carefully specified Gaussian process surrogate, the TVR admits a closed-form acquisition function via normalizing flows, thus allowing for efficient sequential sampling. We explore the effectiveness of TVR in numerical experiments and an application for automobile brake design under operational uncertainties. The second challenge is the need for diverse optimization solutions, which provides users with a basket of "good" solutions for decision-making. For this, we propose a new Diverse Expected Improvement (DEI) method, which extends the popular Expected Improvement method to encourage diversity between near-optimal solutions. The DEI similarly yields a closed-form acquisition function, which reveals a novel exploration-exploitation-diversity trade-off for diverse black-box optimization. We explore the effectiveness of the DEI in two applications, the first on rover trajectory optimization and the second for real-time control of unmanned aerial vehicles.
Tuesday, December 17, 2024
Speaker: Zhijian He (South China University of Technology) [Zoom Link]
Title: Reclaiming importance sampling in Bayesian inverse problems: the quasi-Monte Carlo way
Abstract: Importance Sampling (IS), an effective variance reduction strategy in Monte Carlo (MC) simulation, is frequently utilized for Bayesian inference and other statistical challenges. Quasi-Monte Carlo (QMC) replaces the random samples in MC with low discrepancy points and has the potential to substantially enhance error rates. In this talk, I will show how to integrate IS with a randomly shifted rank-1 lattice rule, a widely used QMC method, to approximate posterior expectations arising from Bayesian Inverse Problems (BIPs) where the posterior density tends to concentrate as the intensity of noise diminishes. Within the framework of weighted Hilbert spaces, we first establish the convergence rate of the lattice rule for a large class of unbounded integrands. This method extends to the analysis of QMC combined with IS in BIPs. Furthermore, we explore the robustness of the IS-based randomly shifted rank-1 lattice rule by determining the quadrature error rate with respect to the noise level. The effects of using Gaussian distributions and t-distributions as the proposal distributions on the error rate of QMC are investigated. We find that the error rate may deteriorate at low intensity of noise when using improper proposals, such as the prior distribution. To reclaim the effectiveness of QMC, we propose a new IS method such that the lattice rule with $N$ quadrature points achieves an optimal error rate close to $O(N^{-1})$, which is insensitive to the noise level. Numerical experiments are conducted to support the theoretical results.