Newcastle Non-Reversible Sampling Workshop

Talk Titles and Abstracts

Day 1

Richard Everitt (University of Warwick)

Title: Active subspaces and MCMC

Abstract: Constantine et al. (2016) introduced a Metropolis-Hastings (MH) approach that target the active subspace of a posterior distribution: a linearly projected subspace that is informed by the likelihood.. Schuster et al. (2017) refined this approach to introduce a pseudo-marginal Metropolis-Hastings, integrating out inactive variables through estimating a marginal likelihood at every MH iteration. In this talk we show empirically that the effectiveness of these approaches is limited in the case where the linearity assumption is violated, and suggest a particle marginal Metropolis-Hastings algorithm as an alternative for this situation. The high computational cost of these approaches leads us to consider alternative approaches to using active subspaces in MCMC that avoid the need to estimate a marginal likelihood: we introduce Metropolis-within-Gibbs and Metropolis-within-particle Gibbs methods that provide a more computationally efficient use of the active subspace.

This is joint work with Leonardo Ripoli (Reading).

Paper at https://arxiv.org/abs/2501.05144

Constantine, P.G., C. Kent, and T. Bui-Thanh. 2016. Accelerating Markov Chain Monte Carlo withActive Subspaces. SIAM Journal on Scientific Computing 38(5): A2779–A2805. https://doi.org/10.1137/15M1042127

Schuster, I., P.G. Constantine, and T.J. Sullivan. 2017. Exact active subspace Metropolis-Hastings, withapplications to the Lorenz-96 system. https://arxiv.org/abs/1712.02749

Sarah Heaps (Durham University)

Title: Bayesian inference of sparsity in stationary, multivariate autoregressive processes

Abstract: In many fields, advances in sensing technology have made it possible to collect large volumes of time-series data on many variables. In a diverse array of fields such as finance, genetics and neuroscience, a key question is whether such data can be used to learn directed relationships between variables. In other words, do changes in one variable consistently precede those in another? Graphical vector autoregressions are a popular tool for characterising directed relationships in multivariate systems because zeros in the autoregressive coefficient matrices have a natural graphical interpretation in terms of the implied Granger (non)-causality structure. In many applications, it is natural to assume that the underlying process is stable so that, for example, uncertainty in forecasts does not increase without bound as the forecast horizon increases. Though stationarity is commonly stated as an assumption, it is generally not enforced as a constraint because enforcing stability demands restricting the autoregressive coefficient matrices to lie in a constrained space, with a complex geometry, called the stationary region. This is problematic because the number parameters grow quadratically with dimension, making it increasingly difficult to learn, with certainty, that a process is stationary. Working in the Bayesian paradigm, we use a parameter expansion approach to tackle the problem of inference for sparse and stable vector autoregressions by constructing a spike-and-slab prior with support constrained to the stationary region. Computational inference is carried out via a Metropolis-within-Gibbs scheme which uses Hamiltonian Monte Carlo to draw from the full conditional distribution of the continuous parameters. To illustrate our approach, we consider an application to encephalography (EEG) data, which seeks to understand brain activity patterns in individuals with epilepsy during non-seizure periods.

Markus Rau (Newcastle University)

Title: Markov Chain Monte Carlo for Projected Point Processes in Cosmology

Abstract: The Large-Scale Structure of the universe, spanning billions of years and sextillions of kilometers, is a crucial source of information for modern cosmology. Its structure and evolution are best described by point process models, which track the distribution of galaxies and galaxy clusters. The coming decade, powered by a new generation of telescopes and surveys, promises to be a golden age for this field. By studying the large-scale structure, we aim to address some of the most fundamental questions in modern physics, including the nature of dark energy, dark matter, and the overall growth of cosmic structure.

However, a significant challenge lies in accurately modeling the data from these surveys. The light from distant galaxies, which originates in four-dimensional spacetime, is projected onto two-dimensional CCD plates in our telescopes. This process introduces a complex selection function—a bias that determines which galaxies are actually observed and recorded. Accurately accounting for this projection is a major methodological hurdle.

Our work addresses this complex inference task by developing customized Markov Chain Monte Carlo (MCMC) techniques. These methods are specifically tailored for projected point processes, allowing us to rigorously model the observational biases and extract unbiased cosmological insights from the vast datasets of upcoming galaxy surveys.

Dani Leonard (Newcastle University)

Title: MCMC for Cosmological Inference

Abstract: Cosmology is the study of the composition, history, and physical laws of our Universe on the largest scales. Modern cosmology, centering on the analysis of large data sets from survey telescopes, inherently involves Bayesian parameter inference in high-dimensional parameter spaces, where likelihood evaluations are computationally expensive and model mis-specification is a persistent challenge. While MCMC remains the standard tool for inference, recent advances in sampling methods are often underutilised. In this talk, I will present two complementary directions to enhance Bayesian inference in cosmology using MCMC. First, I will introduce an emulation technique for fast and flexible evaluation of cosmological likelihoods, discussing how it could serve as a surrogate likelihood within delayed-acceptance MCMC schemes to accelerate sampling. Second, I will describe a dimensionality-reduction approach designed to mitigate model mis-specification by dynamically constructing informative, lower-dimensional summaries of cosmological observables at each MCMC iteration. Together, these approaches seek to improve both the efficiency and robustness of MCMC-based inference in cosmological applications.

Gareth Roberts (University of Warwick)

Title: Ballistic and diffusive lifted MCMC, with application to parallel tempering

Abstract: In this talk I will review the popular “lifting” mechanism for producing non-reversible Markov chain Monte Carlo such as non-reversible Metropolis-Hastings and piecewise-deterministic Markov processes. These methods aim to have better mixing by providing momentum to break down random walk behaviour of algorithms.The presentation will investigate how these behave in a collection of stylised high-dimensional examples showing that the non-reversibility can often be washed out by the problem complexity so that the algorithm behaves asymptotically in a reversible way. On the other hand lifted algorithms still retain a small efficiency advantage over their reversible counterparts. Furthermore, we will show that some carefully constructed higher-order lifted Metropolis-Hastings algorithms can retain some aspects of ballistic behaviour, even in the high-dimensional limit setting.

Day 2

Jere Koskela (Newcastle University)

Title: Zig-Zag sampling for discrete variables in phylogenetics

Abstract: The coalescent is a gold-standard model for DNA sequence data in phylogenetics, but is often impractical because of an intractable likelihood function. Typical data augmentation strategies involve sampling from posteriors defined on spaces of ancestral trees. Natural Metropolis-Hastings algorithms scale notoriously poorly in this setting, and the space of ancestral trees is complex and ill-structured enough that it can be hard to even specify more advanced algorithms such as HMC or MALA. I will review some of the pathologies of posterior distributions in genetics, and highlight why they make for compelling and challenging test cases for sampling algorithms. I'll then show how continuous-time, nonreversible sampling methods can be implemented for posterior distributions exhibiting both discrete and continuous coordinates, with essentially no assumptions on how the discrete variables are structured. This will make it possible to apply such methods to the space of ancestral trees in genetics, where they yield efficiency gains of up to several orders of magnitude relative to a well-tuned Metropolis-Hastings sampler.

Luke Hardcastle (University College London)

Title: Sampling diffusion piecewise exponential models using piecewise deterministic Monte Carlo

Abstract: The piecewise exponential model is a popular tool for modelling time-to-event data. In this work we introduce a novel prior framework for this model based on the discretisation of a diffusion for the evolution of the hazard function, and a Poisson point process for the location of knots. The latter prior results in a transdimensional posterior, that can challenge standard reversible jump MCMC methods. Recent works have shown that continuous time Piecewise Deterministic Monte Carlo methods can efficiently sample from transdimensional posteriors induced by spike and slab priors. In this work we extend these methods to show how they can be used to sample from more general transdimensional posteriors, in particular for sampling over the set of knots in the piecewise exponential model. Further, we show how the mixing time of the sampler strongly depends on the choice of parameterisation of the underlying diffusion.

Moritz Schauer (Chalmers University of Technology + University of Gothenburg)

Title: Creating non-reversible rejection-free samplers by rebalancing skew-balanced Markov jump processes

Abstract: Markov chain sampling methods form the backbone of modern computational statistics. However, many popular methods are prone to random walk behavior, i.e., diffusion-like exploration of the sample space, leading to slow mixing that requires intricate tuning to alleviate. Non-reversible samplers can resolve some of these issues. We introduce a device that turns jump processes that satisfy a skew-detailed balance condition for a reference measure into a process that samples a target measure that is absolutely continuous with respect to the reference measure. The resulting sampler is rejection-free, non-reversible, and continuous-time. As an example, we apply the device to Hamiltonian dynamics discretized by the leapfrog integrator, resulting in a rejection-free non-reversible continuous-time version of Hamiltonian Monte Carlo (HMC). We prove the geometric ergodicity of the resulting sampler under certain convexity conditions, and demonstrate its qualitatively different behavior to HMC through numerical examples.

Sanket Agrawal (University of Warwick)

Title: Large sample scaling analysis of the Zig-Zag process

Abstract: Piecewise deterministic Markov processes provide scalable methods for sampling from the posterior distributions in big data settings by admitting principled sub-sampling strategies that do not bias the output. An important example is the Zig-Zag process where clever sub-sampling has been empirically shown to produce an essentially independent sample at a cost that does not scale with the size of the data. However, sub-sampling also leads to slower convergence and poor mixing of the process, a behaviour which questions the promised scalability of the algorithm. In this talk, we will give rigorous results concerning the effect of subsampling on the Zig-Zag paths using a scaling limit analysis as the data size goes $n$ goes to infinity. Interplaying closely with the large-sample Bayesian asymptotics, we will prove weak convergence results for the underlying PDMP for different sub-sampling versions of the Zig-Zag algorithm. Based on these results, we will argue that using suitable control variates with sub-sampling in Zig-Zag, the algorithm costs O(1) to obtain an essentially independent sample; a computational speed-up of O(n) over the canonical version of Zig-Zag and other traditional MCMC methods.

Yuga Iguchi (Lancaster University)

Title: Skew-symmetric schemes for stochastic differential equations with non-Lipschitz drift: an unadjusted Barker algorithm

Abstract: We propose a new simple and explicit numerical scheme for time-homogeneous stochastic differential equations. The scheme is based on sampling increments at each time step from a skew-symmetric probability distribution, with the level of skewness determined by the drift and volatility of the underlying process. We show that as the step-size decreases the scheme converges weakly to the diffusion of interest. We then consider the problem of simulating from the limiting distribution of an ergodic diffusion process using the numerical scheme with a fixed step-size. We establish conditions under which the numerical scheme converges to equilibrium at a geometric rate, and quantify the bias between the equilibrium distributions of the scheme and of the true diffusion process. Notably, our results do not require a global Lipschitz assumption on the drift, in contrast to those required for the Euler--Maruyama scheme for long-time simulation at fixed step-sizes. Our weak convergence result relies on an extension of the theory of Milstein & Tretyakov to stochastic differential equations with non-Lipschitz drift, which could also be of independent interest. We support our theoretical results with numerical simulations. This is a joint work with Samuel Livingstone, Nikolas Nüsken, Giorgos Vasdekis and Rui-Yang Zhang.

Francesco Pozza (Bocconi University)

Title: Zero-order parallel sampling

Abstract: Finding effective ways to exploit parallel computing to speed up MCMC convergence is an important problem in Bayesian computation and related disciplines. Here we consider the zero-order (aka derivative-free) version of the problem, where we assume that (a) the gradient of the target distribution is unavailable (either for theoretical, practical or computational reasons) and (b) we can evaluate the (expensive) target distribution in parallel at K different locations and use these evaluations to speed up MCMC convergence. We make two main contributions in this respect. First, we show that any method falling within a fairly general "multiple proposal framework" can only speed up convergence by log(K) factors in high dimensions. The fundamental limitation of such a framework, which includes multiple-try MCMC as well as many other previously proposed methods, is that it restricts possible moves to the support of the K evaluation points. We state our results in terms of upper bounds on the spectral gap of the resulting scheme. Second, we discuss how stochastic gradient estimators can be used to make better use of parallel computing and achieve polynomial speedups in K. Some of the methods have similarities, but also notable differences, with classical zero-order optimization methods.

Katerina Karoni (University of Bristol)

Title: Adaptive friction and nonlinear damping for model training

Abstract: We discuss novel damping procedures for training large scale Bayesian data models,such as deep neural networks. Drawing inspiration from the concept of a thermostat (widely used for temperature regulation in molecular dynamics), we introduce kinetic energy controls on the individual parameter velocities of the model. This approach can be likened to component-wise Nosé-Hoover style thermostatting taken in the zero-temperature limit and it can be directly related to the introduction of cubic damping, a vibration suppression mechanism used in structural engineering applications. While a large momentum parameter helps to overcome barriers and progress training in low-curvature regions, it should be reduced in areas with steep gradients to avoid instability; our adaptive scheme allows this adjustment to be performed automatically, on a per-parameter basis. By using these schemes, we obtain enhanced efficiency, including significant speedups and test accuracy improvements in representative deep learning tasks.

Peter Whalley (ETH Zurich)

Title: Scalable kinetic Langevin Monte Carlo methods for Bayesian inference

Abstract: We introduce Langevin Monte Carlo methods for estimating expectations of observables under high-dimensional probability measures. We discuss discretization strategies and Metropolization methods for removing bias due to discretization error. We then present a new unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. We prove similar results using both approximate and stochastic gradients and show that our method’s computational cost scales independently of the size of the dataset. Our numerical experiments demonstrate that our unbiased algorithm outperforms the “gold-standard” randomized Hamiltonian Monte Carlo. We then discuss whether it is always necessary to remove discretization bias in high-dimensional models and applications of unadjusted MCMC algorithms in Bayesian Neural Network models.

Day 3

Cameron Bell (University of Warwick)

Title: Stereographic Bouncy Particle Sampler and Adaptive MCMC for Continuous Time Processes

Abstract: Originally proposed in Yang, Latuszynski and Roberts (2024), the Stereographic Bouncy Particle Sampler (SBPS) is a continuous time MCMC algorithm which uses the stereographic projection to improve mixing of traditional MCMC algorithms, particularly when the target distribution is heavy tailed. As an example of a piecewise-deterministic Markov process (PDMP), the SBPS is a non-reversible algorithm which follows deterministic dynamics before updating the latent velocity of the particle at random times. However, the parameters used in the stereographic projection heavily impact the performance of the algorithm. It is therefore of interest to create an adaptive version of the SBPS which automatically updates the parameters as the process runs. In this talk, we discuss the construction and strengths of the adaptive SBPS algorithm, as well as introducing a novel methodology for the proofs of asymptotic properties of continuous time adaptive algorithm. These topics are explored in full in Bell, Latuszynski and Roberts (2024).

Rocco Caprio (University of Warwick)

Title: Optimisation and sampling, EM, and related inequalities

Abstract: In this talk, we will explore algorithms for solving the maximum marginal likelihood estimation problem. These methods seek to optimise the free energy functional (i.e. the negative ELBO) using a combination of optimisation and sampling procedures. We will then introduce some functional inequalities that guarantee the fast convergence of such algorithms, and highlight their connections to fundamental inequalities in the theory of sampling.

Mariya Mamajiwala (University of Nottingham)

Title: Rapid calibration of atrial electrophysiology models using Gaussian process emulators in the ensemble Kalman filter

Abstract: Atrial fibrillation (AF) is a common cardiac arrhythmia characterised by disordered electrical activity in the atria. The standard treatment is catheter ablation, which is invasive and irreversible. Recent advances in computational electrophysiology offer the potential for patient-specific models, often referred to as digital twins, that can be used to guide clinical decisions. To be of practical value, we must be able to rapidly calibrate physics-based models using routine clinical measurements. We pose this calibration task as a static inverse problem, where the goal is to infer tissue-level electrophysiological parameters from the available observations. To make this tractable, we replace the expensive forward model with Gaussian process emulators (GPEs) and propose a novel adaptation of the ensemble Kalman filter (EnKF) for static non-linear inverse problems. The approach yields parameter samples that can be interpreted as coming from the best Gaussian approximation of the posterior distribution. We compare our results with those obtained using Markov chain Monte Carlo (MCMC) sampling and demonstrate the potential of the approach to enable near-real-time patient-specific calibration, a key step towards predicting outcomes of AF treatment within clinical timescales. The approach is readily applicable to a wide range of static inverse problems in science and engineering.

Chris Oates (Newcastle University)

Title: De-randomising the (mean field) Langevin dynamics

Abstract: Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. However, this increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called "kernel gradient discrepancy" (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel charasterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed, including a natural generalisation of Stein variational gradient descent, which can be viewed as a (kernel-smoothed) de-randomisation of the mean field Langevin dynamics.

Page updated

Google Sites

Report abuse