Abstracts

Monday, 12 April

Australian Session

Clara Grazian (University of New South Wales). Approximate Bayesian Conditional Copulas: Many proposals are now available to model complex data, in particular thanks to the recent advances in computational methodologies and algorithms which allow to work with complicated likelihood functions in a reasonable amount of time. However, it is, in general, difficult to analyse data characterized by complicated forms of dependence. Copula models have been introduced as probabilistic tools to describe a multivariate random vector via the marginal distributions and a copula function which captures the dependence structure among the vector components, thanks to the Sklar’s theorem, which states that any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and the copula function. Major areas of application include econometrics, hydrological engineering, biomedical science, signal processing and finance. Bayesian methods to analyse copula models tend to be computational intensive or to rely on the choice of a particular copula function, in particular because methods of model selection are not yet fully developed in this setting. We will present a general method to estimate some specific quantities of interest of a generic copula by adopting an approximate Bayesian approach based on an approximation of the likelihood function. Our approach is general, in the sense that it could be adapted both to parametric and nonparametric modelling of the marginal distributions and can be generalised in presence of covariates. It also allows to avoid the definition of the copula function. We will apply the methods to real-data examples from material science and astrophysics.


David Nott (National University of Singapore). Detecting Conflicting Summary Statistics in Likelihood-free Inference: Bayesian likelihood-free methods implement Bayesian inference using simulation of data from the model to substitute for intractable likelihood evaluations. Most likelihood-free inference methods replace the full data set with a summary statistic before performing Bayesian inference, and the choice of this statistic is often difficult. The summary statistic should be low-dimensional for computational reasons, while retaining as much information as possible about the parameter. Using a recent idea from the interpretable machine learning literature, we develop some regression-based diagnostic methods which are useful for detecting when different parts of a summary statistic vector contain conflicting information about the model parameters. Conflicts of this kind complicate summary statistic choice, and detecting them can be insightful about model deficiencies and guide model improvement. The diagnostic methods developed are based on regression approaches to likelihood-free inference, in which the regression model estimates the posterior density using summary statistics as features. Deletion and imputation of part of the summary statistic vector within the regression model can remove conflicts and approximate posterior distributions for summary statistic subsets. A larger than expected change in the estimated posterior density following deletion and imputation can indicate a conflict in which inferences of interest are affected. The usefulness of the new methods is demonstrated in a number of real examples. This is joint work with Yinan Mao, Xueou Wang and Michael Evans.


Hien Nguyen (La Trobe). Distance-based ABC Procedures: ABC procedures often rely on appropriate choices of summary statistics that can be used to measure the difference between simulated data and observed data. Using appropriate distances over probability measures, it is possible to construct ABC procedures that directly measure the difference between samples, thus avoiding the use of summary statistics. Numerous proposals have been made regarding appropriate choices of distances, including the Kullback-Leibler divergence, the maximum mean discrepancy, the Wasserstein distance, and the energy statistic, each with their own merits. We review the current state of the literature and present some examples of such distance-based methods.

Joint Session - Computational

Leah South (Queensland University of Technology). BSL: An R package for Bayesian Synthetic Likelihood and its Extensions: This mini-tutorial will present the R package "BSL" for Bayesian synthetic likelihood (BSL) and its extensions. Using several examples, I will describe how to build a BSL object with a function to simulate data, a function to calculate summary statistics and a log prior function. The motivation for extensions such as penalised covariance estimation and semi-parametric alternatives to the standard Gaussian summary statistic approximation will be described and their straightforward application in the package will be illustrated through examples.


Henri Pesonen (University of Oslo). ELFI: Engine for Likelihood-free Inference: ELFI: Engine for likelihood-free inference is a Python library for implementing the likelihood-free inference workflow. In this tutorial we will use examples to demonstrate how to arrange components such as priors, simulators, summaries and distances to a network model called ELFI graph. In addition, we will show how to carry out the inference using ELFI graphs and the various built-in inference methods, describe the logic of implementing new inference methods and discuss other library features designed to help with the analysis workflow.


Lorenzo Pacchiardi (University of Oxford). ABCpy: a Package for State-of-the-art Likelihood-Free Inference Techniques: I will present the ABCpy Python library, which includes many state-of-the-art ABC algorithms as well as other Likelihood-Free Inference (LFI) techniques. ABCpy is highly structured, easily extensible and benefits from seamless MPI parallelization, which allows to scale to HPC structures. Other features are automatic summary statistics (with neural networks), nested parallelization, composite perturbation kernels, convergence diagnostic tools and inference with co-occuring observations for different model outputs. I will showcase how to define a model and perform inference, highlighting some of the above features. Further, I will show how new inference approaches can be tested with few lines of code by extending the library, by using as an example our recent work on generalized Bayesian LFI with scoring rules.


Florence Forbes (Inria Grenoble). Approximate Bayesian Computation with Surrogate Posteriors: A key ingredient in approximate Bayesian computation (ABC) procedures is the choice of a discrepancy that describes how different the simulated and observed data are, often based on a set of summary statistics when the data cannot be compared directly. Unless discrepancies and summaries are available from experts or prior knowledge, which seldom occurs, they have to be chosen and this can affect the approximations. Their choice is an active research topic, which has mainly considered data discrepancies requiring samples of observations or distances between summary statistics, to date. In this work, we introduce a preliminary learning step in which surrogate posteriors are built from finite Gaussian mixtures using an inverse regression approach. These surrogate posteriors are then used in place of summary statistics and compared using metrics between distributions in place of data discrepancies. Two such metrics are investigated, a standard L₂ distance and an optimal transport-based distance. The whole procedure can be seen as an extension of the semi-automatic ABC framework to functional summary statistics. The resulting ABC quasi-posterior distribution is shown to converge to the true one, under standard conditions. Performance is illustrated on both synthetic and real data sets, where it is shown that our approach is particularly useful when the posterior is multimodal.

Methodology I

Louis Raynal (Harvard University). Scalable Approximate Bayesian Computation for Growing Network Models via Extrapolated and Sampled Summaries: Parameter estimation with approximate Bayesian computation (ABC) requires the ability to forward simulate datasets from a candidate model, but because the sizes of the observed and simulated datasets usually need to match, this can be computationally expensive. Additionally, since ABC inference is based on comparisons of summary statistics computed on the observed and simulated data, using computationally expensive summary statistics can lead to further losses in efficiency. ABC has recently been applied to the family of mechanistic network models, an area that has traditionally lacked tools for inference and model choice. Mechanistic models of network growth repeatedly add nodes to a network until it reaches the size of the observed network, which may be of the order of millions of nodes. With ABC, this process can quickly become computationally prohibitive due to the resource intensive nature of network simulations and evaluation of summary statistics. We propose two methodological developments to enable the use of ABC for inference in models for large growing networks. First, to save time needed for forward simulating model realizations, we propose a procedure to extrapolate summary statistics from small to large networks. Second, to reduce computation time for evaluating summary statistics, we use sample-based rather than census-based summary statistics. We show that the ABC posterior obtained through this approach, which adds two additional layers of approximation to the standard ABC, is similar to a classic ABC posterior. Although we deal with growing network models, both extrapolated summaries and sampled summaries are expected to be relevant in other ABC settings where the data are generated incrementally.


Edwin Fong (University of Oxford). Martingale Posteriors: Bayesian Uncertainty via Imputation: The prior distribution on parameters of a likelihood is the usual starting point for Bayesian uncertainty quantification. In this paper, we present a different perspective. Given a finite data sample of size n from an infinite population, we focus on the missing remainder of the population as the source of statistical uncertainty, with the parameter of interest being known precisely given the population. We argue that the foundation of Bayesian inference is to assign a predictive distribution on the missing data conditional on the sample, which then induces a distribution on the parameter of interest. Demonstrating an application of martingales, Doob shows that choosing the Bayesian predictive distribution returns the conventional posterior as the distribution of the parameter. Taking this as our cue, we relax the predictive machine, avoiding the need for the predictive to be derived solely from the usual prior to posterior to predictive density formula. We introduce the martingale posterior distribution, which returns Bayesian uncertainty directly on any statistic of interest without the need for the likelihood and prior, and this distribution can be sampled through a computational scheme we name predictive resampling. To that end, we introduce new predictive methodologies for multivariate density estimation, regression and classification that build upon recent work on bivariate copulas.


Justin Alsing (Stockholm University). Bayesian decision making under intractable likelihoods: I’ll give an overview of conditional density-estimation approaches to likelihood-free inference. Then, I'll present some fun experimental ideas on using ideas from density-estimation LFI and ABC to making optimal Bayesian decisions under models with intractable likelihoods. Traditionally, Bayesian decision problems proceed in two steps: (1) perform posterior inference, (2) find the action that optimizes the expected utility under the inferred posterior predictive distribution for future outcomes. I’ll show that, similar to likelihood-free inference, Bayesian decision problems can be cast as conditional density estimation tasks which can be tackled directly, ie., by-passing the intermediate inference step completely, and without the need for tractable likelihoods. Decision problems cast this way are in many cases simpler (lower-dimensional) than the corresponding inference task, so substantial gains may be achievable by skipping the inference step.

Methodology II

Cecilia Viscardi (University of Florence). A large deviation approach to approximate Bayesian computation: We consider the problem of sample degeneracy in approximate Bayesian computation (ABC). This problem arises when proposed values of the parameters, once given as input to the generative model, rarely lead to simulations resembling the observed data and are hence discarded. Such ``poor'' parameters' proposals do not contribute to the representation of the parameters' posterior distribution. This state of affairs leads to a large number of required simulations and/or a waste of computational resources and distortions in the computed posterior distribution. To mitigate this problem, we propose two Large Deviation Approximate Bayesian Computation algorithms ( LD-ABC), where the evaluation of the probability of rare events allows avoiding the rejection step altogether. We adopt the information-theoretic “Method of Types” formulation of Large Deviations, thus restricting attention to models for i.i.d. discrete random variables and for finite-state Markov chains. Finally, we experimentally evaluate our methodology through a proof-of-concept implementation.


Michael Gutmann (University of Edinburgh). Robust Optimisation Monte Carlo: Approximate Bayesian Computation (ABC) is a framework to perform Bayesian inference when the likelihood function is intractable but simulating from the model is possible. While basic ABC algorithms are widely applicable, they are notoriously slow and much research has focused on increasing their efficiency. Optimisation Monte Carlo (OMC) has recently been proposed as an efficient and embarrassingly parallel method that leverages optimisation to accelerate the inference. We first demonstrate a previously unrecognised important failure mode of OMC: It generates strongly overconfident approximations by collapsing regions of similar or near-constant posterior density into a single point. We then propose an efficient, robust generalisation of OMC that corrects this. It makes fewer assumptions, retains the main benefits of OMC, and can be performed either as part of OMC or entirely as post-processing.


Ritabrata Dutta (University of Warwick). Score Matched Conditional Exponential Families for Likelihood-Free Inference: To perform Bayesian inference for stochastic simulator models for which the likelihood is not accessible, Likelihood-Free Inference (LFI) relies on simulations from the model. Standard LFI methods can be split according to how these simulations are used: to build an explicit Surrogate Likelihood, or to accept/reject parameter values according to a measure of distance from the observations (Approximate Bayesian Computation (ABC)). In both cases, simulations are adaptively tailored to the value of the observation. Here, we generate parameter-simulation pairs from the model independently on the observation, and use them to learn a conditional exponential family likelihood approximation; to parametrize it, we use Neural Networks whose weights are tuned with Score Matching. With our likelihood approximation, we can employ MCMC for doubly intractable distributions to draw samples from the posterior for any number of observations without additional model simulations, with performance competitive to comparable approaches. Further, the sufficient statistics of the exponential family can be used as summaries in ABC, outperforming the state-of-the-art method in five different models with known likelihood. Finally, we apply our method to a challenging model from meteorology.

Tuesday, 13 April

Australian Session

Anthony Ebert (Università della Svizzera Italiana). Combined parameter and state inference for automatically calibrated ABC: In state space models, the problem of making combined inferences about fixed parameters and time-varying parameters (states), based on some time-indexed observations, has been the subject of much recent literature. Applying combined parameter and state inference techniques to state space models with intractable likelihoods requires extensive manual calibration of a time-indexed tuning parameter, the ABC distance threshold. We construct an algorithm, which performs this inference, that automatically calibrates the threshold as it progresses through the observations. There are no other time-indexed tuning parameters. We demonstrate this algorithm with three examples: a simulated example of skewed normal distributions, an inhomogenous Hawkes process, and an econometric volatility model.


Jacob Priddle (Queensland University of Technology). Efficient Bayesian Synthetic Likelihood with Whitening Transformations: Likelihood-free methods are an established approach for performing approximate Bayesian inference for models with intractable likelihood functions. However, they can be computationally demanding. Bayesian synthetic likelihood (BSL) is a popular such method that approximates the likelihood function of the summary statistic with a known, tractable distribution — typically Gaussian — and then performs statistical inference using standard likelihood-based techniques. However, as the number of summary statistics grows, the number of model simulations required to accurately estimate the covariance matrix for this likelihood rapidly increases. This poses a significant challenge for the application of BSL, especially in cases where model simulation is expensive. In this article we propose whitening BSL (wBSL) — an efficient BSL method that uses approximate whitening transformations to decorrelate the summary statistics at each algorithm iteration. We show empirically that this can reduce the number of model simulations required to implement BSL by more than an order of magnitude, without much loss of accuracy. We explore a range of whitening procedures and demonstrate the performance of wBSL on a range of simulated and real modelling scenarios from ecology and biology.


Gael Martin (Monash University). Loss-based Variational Bayes Prediction: We propose a new method for Bayesian prediction that caters for models with a large number of parameters and is robust to model misspecification. Given a class of high-dimensional (but parametric) predictive models, this new approach constructs a posterior predictive using a variational approximation to a loss-based, or Gibbs, posterior that is directly focused on predictive accuracy. The theoretical behavior of the new prediction approach is analyzed and a form of optimality demonstrated. Applications to both simulated and empirical data using high-dimensional Bayesian neural networks and autoregressive mixture models demonstrate that the approach provides more accurate results than various alternatives, including misspecified likelihood-based predictions.


Matias Quiroz (University of Technology Sydney). Spectral Subsampling MCMC for Stationary Multivariate Time Series: Spectral subsampling MCMC was recently proposed to speed up Markov chain Monte Carlo (MCMC) for long stationary univariate time series by subsampling periodogram observations in the frequency domain. We extend the approach to stationary multivariate time series. We also propose a multivariate generalisation of the autoregressive tempered fractionally differentiated moving average model (ARTFIMA) and establish some of its properties. This novel model is shown to provide a better fit compared to multivariate autoregressive moving average models for three real world examples. We demonstrate that spectral subsampling may provide up to two orders of magnitude faster estimation, while retaining MCMC sampling efficiency and accuracy, compared to spectral methods using the full dataset.

Joint Session - Computational

David Frazier (Monash University). Synthetic Likelihood in Misspecified Models: Consequences and Corrections We analyse the behaviour of the synthetic likelihood (SL) method when the model generating the simulated data differs from the actual data‐generating process. One of the most common methods to obtain SL-based inferences is via the Bayesian posterior distribution, with this method often referred to as Bayesian synthetic likelihood (BSL). We demonstrate that when the model is misspecified, the BSL posterior can be poorly behaved, placing significant posterior mass on values of the model parameters that do not represent the true features observed in the data. Theoretical results demonstrate that in misspecified models the BSL posterior can display a wide range of behaviours depending on the level of model misspecification, including being asymptotically non-Gaussian. Our theoretical results suggest that a recently proposed robust BSL approach can ameliorate this behavior and deliver reasonable posterior inference under model misspecification. We document all theoretical results using a simple running example.


Chris Drovandi (Queensland University of Technology). To Summarise or Not to Summarise in Likelihood-Free Inference: The general consensus in likelihood-free inference has been that it is most efficient to compare datasets on the basis of a low dimensional informative summary statistic, incurring information loss in favour of reduced dimensionality. More recently, researchers have explored various approaches for efficiently comparing empirical distributions in the likelihood-free context in an effort to avoid data summarisation. Here we perform the first comprehensive comparison of such methods, both qualitatively and empirically. We also conduct a substantive empirical comparison with summary statistic based likelihood-free methods. Whilst we find the best approach to be problem dependent, we also find that the full data distance based approaches are promising and warrant further development. This is joint work with David Frazier.


Olivier Zahm (Inria Grenoble). Data-free Dimension Reduction for Bayesian Inverse Problems: A high dimensional Bayesian inverse problem has a low effective dimension when the data are informative only on a low-dimensional subspace. In this talk, we show how to use the Fisher information matrix to detect such a subspace before the data are observed. The proposed approach allows to control the approximation error (in expectation over the data) of the posterior distribution. We also present sampling strategies which exploit the informed subspace to draw efficiently samples from the posterior distribution.


Grégoire Clarté (Université Paris Dauphine - PSL). Component Wise Approximate Bayesian Computation: Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.

Methodology III

Riccardo Corradin (University of Milano-Bicocca). Approximate estimation of latent random partitions: Latent random partitions are common objects in mixture model and model based clustering frameworks. Despite their flexibility, they might be computationally or mathematically hardly tractable. We propose a general approximate approach to perform inference, from a Bayesian perspective, on latent random partitions. Along with the definition of an ABC-MCMC sampling strategy, we introduce an adaptive scheme to simplify the specification of the algorithm and its tuning parameters. We further investigate the performance of the proposed method with a simulation study, by comparing the ABC-MCMC scheme with possible competitors known in literature, in terms of computational time and accuracy of the estimated partition.


Marko Järvenpää (University of Oslo). Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC: We present an efficient approach for doing approximate Bayesian inference when only a limited number of noisy likelihood evaluations can be obtained due to computational constraints, which is becoming increasingly common for applications of complex models. Our main methodological innovation is to model the log-likelihood function using a Gaussian process (GP) in a local fashion and apply this model to emulate the progression that an exact Metropolis-Hastings (MH) algorithm would take if it was applicable. New log-likelihood evaluation locations are selected using sequential experimental design strategies such that each MH accept/reject decision is done within a pre-specified error tolerance. The resulting approach is conceptually simple and sample-efficient as it takes full advantage of the GP model. It is also more robust to violations of GP modelling assumptions and better suited for the typical situation where the posterior is substantially more concentrated than the prior, compared with various existing inference methods based on global GP surrogate modelling. We briefly discuss the probabilistic interpretations and some theoretical aspects of our approach, and we then demonstrate the benefits of the resulting algorithm in the context of likelihood-free inference.


Pedro Rodrigues (Inria Saclay). Leveraging Global Parameters for Flow-based Neural Posterior Estimation: Inferring the parameters of a stochastic model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of

parameters yield identical observations. This arises in many practical situations, such as when inferring the distance and power of a radio source (is the source close and weak or far and strong?) or estimating the amplifier gain and underlying brain activity of an electrophysiological experiment. In this work, we present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters. Our method extends recent developments in simulation-based inference (SBI) based on normalizing flows to Bayesian hierarchical models. Paper available at https://arxiv.org/abs/2102.06477