Marko Järvenpää, Michael U. Gutmann, Arijus Pleska, Aki Vehtari, Pekka Marttinen
Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood function is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations but running the simulation model can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next, but standard BO strategies used in previous work are designed for optimisation and not specifically for ABC inference. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies. Reference: https://arxiv.org/abs/1704.00520
Marko Järvenpää, Michael U. Gutmann, Aki Vehtari, Pekka Marttinen
Approximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that it can be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins. Reference: https://arxiv.org/abs/1610.06462
Emille Ishida
The Cosmostatistics Initiative (COIN) was founded in 2014 under the auspices of the International Astrostatistics Association (IAA), with the goal of overcoming the cultural barriers preventing the daily collaboration between researchers from different fields. Most of its activities are developed within the COIN Residence Program (CRP), a non-structured meeting which utilizes a management model similar to technological start-ups. Our approach enables rapid innovation in data-science methodologies driven by scientific questions. Since its conception, COIN has grown to more than 60 researchers from six continents, from fields as diverse as astrophysics, statistics, computer science, epidemiology, biostatistics, and medical sciences. Moreover, the CRPs have proven to be one of the most productive events of its kind -- producing nine refereed scientific papers, three software packages, four value-added galaxy catalogues in its first 3 years. One of these products were the first ABC package for astronomers, developed during the first CRP, in 2014. The poster aims to describe the challenges we face in interdisciplinary science development and discuss how experiences like COIN can provide insights about the future of our current academic model.
Ivis Kerama, Vicky Boult, Richard Everitt, Richard Sibly
The use of approximate Bayesian computation as a tool for inference has seen a significant increase during the last couple of years, primarily due to its simple yet powerful character in a number of biological -but not only- domain problems. Various modifications have been proposed with the incorporation of ABC onto numerous algorithms, such as MCMC and SMC, PMC (and added benefits thereof). In this work we make use of the adaptive ABC-SMC (P Del Moral et al. 2012) method on an individual-based model of elephants where the number of parameters is relatively large and the usage of simple rejection ABC while marginally good for the prediction of time series by the biological model fails to narrow down the priors for any further analysis among other things. The adaptive character of ABC-SMC augmented with various other algorithmic and methodological improvements produces significantly better results on most parameters of interest while allowing for added insights into the model itself as well as paving the way for the incorporation of model error and calibration in a substantially better and more rigorous fashion.
Guillaume Kon Kam King, Matteo Ruggiero, Antonio Canale
Functional time series naturally appear in contexts where phenomena are measured regularly. Examples include the income distribution over time, the evolution of molecular size distribution during polymerisation, or daily demand/offer curves in an exchange market. Trends are common in these series: higher incomes might tend to increase while lower incomes stagnate or decrease, polymerisation increases molecule sizes globally, and prices commonly show rising or falling trends. The functional nature of the data raises a challenge for the inference and indeed, the likelihood can be intractable in the case of fully observed functions. We present a likelihood-free approach for functional data forecast with a trend phenomenon. We develop a bayesian nonparametric model based on a dependent process. It builds on particle system models, which originate from population genetics. This construction provides a means to flexibly specify the correlation of the dependent process. We take advantage of the expressiveness of interacting particle models to embed a local and transient trend mechanism. To this aim, we draw inspiration from interaction potentials between physical particle systems in molecular dynamics. We perform the likelihood-free inference by means of Approximate Bayesian Computation (ABC). We discuss the elicitation of informative summary statistics for stochastic processes building on the idea of semi-automatic summaries. Coupled with a population ABC, this results in a very versatile inference method. We show the increased robustness of the trended model and comment on the generality of our approach for building functional forecast models.
Dennis Prangle, Tom Ryder, Andrew Golightly, Stephen McGough
Parameter inference for stochastic differential equations is challenging due to the presence of a latent diffusion process. Working with an Euler-Maruyama discretisation for the diffusion, we use variational inference to jointly learn the parameters and the diffusion paths. We use a standard mean-field variational approximation of the parameter posterior, and introduce a recurrent neural network to approximate the posterior for the diffusion paths conditional on the parameters. This neural network learns how to provide Gaussian state transitions which bridge between observations in a very similar way to the conditioned diffusion process. The resulting black-box inference method can be applied to any SDE system with light tuning requirements. We illustrate the method on a Lotka-Volterra system and an epidemic model, producing accurate parameter estimates in a few hours.
Owen Jones
ABC-MCMC uses a proposal chain q and has two rejection opportunities at each step: the first using a simulated data point, and the second based on the gradient of q. The first check is typically computationally expensive, so in practice we check using the gradient of q first, and only simulate if that check is successful. Here we consider two ways of dynamically choosing q so that the simulation check is more likely to succeed: using simple i.i.d. proposals and using a Langevin type chain. Using estimation of the Ricker map as an example, we see empirically that both approaches improve the efficiency, as measured by the effective sample size divided by the number of simulations performed. Our baselines in each case are vanilla-ABC (i.i.d. proposals from the prior), and random-walk ABC-MCMC (where q describes a random walk).
Owen Thomas, Ritabrata Dutta, Jukka Corander, Samuel Kaski, Michael U. Gutmann
We consider the problem of parametric statistical inference when likelihood computations are prohibitively expensive but sampling from the model is possible. Several likelihood-free methods have been developed to perform inference in the absence of a likelihood function. The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution. In another popular approach called approximate Bayesian computation, the inference is performed by identifying parameter values for which the summary statistics of the simulated data are close to those of the observed data. We here present an alternative inference approach that is as easy to use as synthetic likelihood but not as restricted in its assumptions, and that enables automatic selection of relevant summary statistic from a large set of candidates. The basic idea is to frame the problem of estimating the posterior as a problem of estimating the ratio between the data generating distribution and the marginal distribution. This problem can be solved by logistic regression, and including regularising penalty terms enables automatic selection of the summary statistics relevant to the inference task. We illustrate the general theory on toy problems and use it to perform inference for stochastic nonlinear dynamical systems. Reference: https://arxiv.org/abs/1611.10242
Wentao Li, Min-ge Xie, Suzanne Thornton
Approximate Bayesian computing (ABC) is a powerful likelihood-free method that has grown increasingly popular since early applications in population genetics. However, complications arise in the theoretical justification for Bayesian inference when using ABC with a non-sufficient summary statistic. In this paper, we seek to re-frame ABC within a frequentist context and justify its performance by the frequency coverage rate. In doing so, we develop a new computational technique called approximate confidence distribution computing (ACC), that yields theoretical support for the use of non-sufficient summary statistics in likelihood-free methods. Furthermore, we demonstrate that ACC extends the scope of ABC to include data-dependent priors without damaging the inferential integrity. This data-dependent prior can be viewed as an initial ‘distribution estimate’ of the target parameter which is updated with the results of the ACC method. A general strategy for constructing an appropriate data-dependent prior is also discussed and is shown to often increase the computing speed while maintaining statistical guarantees. We supplement the theory with simulation studies illustrating the benefits of the ACC method, namely the potential for broader applications than ABC and the increased computing speed compared to ABC.
Samuel Wiqvist, Umberto Picchini, Julie Lyng Forman
Delayed-acceptance Markov chain Monte Carlo (DA-MCMC) samples from a probability distribution, via a two-stages version of the Metropolis-Hastings algorithm, by combining the target distribution with a ``surrogate'' (i.e. an approximate and computationally cheaper version) of said distribution. DA-MCMC accelerates MCMC sampling in complex applications, while still targeting the exact distribution. We design a computationally faster DA-MCMC algorithm, which samples from an approximation of the target distribution. As a case study, we also introduce a novel stochastic differential equation model for protein folding data. We consider parameters inference in a Bayesian setting where a surrogate likelihood function is introduced in the delayed-acceptance scheme. In our applications we employ a Gaussian process as a surrogate likelihood, but other options are possible. In our accelerated algorithm the calculations in the ``second stage'' of the delayed-acceptance scheme are reordered in such as way that we can obtain a significant speed-up in the MCMC sampling, when the evaluation of the likelihood function is computationally intensive. We consider both simulations studies, and the analysis of real protein folding data. Simulation studies for the stochastic Ricker model and the novel stochastic differential equation model for protein-folding data, show that the speed-up is highly problem dependent. The more involved the computations of the likelihood function are; or the more computationally intensive an estimation of the likelihood function is, when considering an intractable likelihood problem; the higher the acceleration becomes when using our algorithm. Inference results for the standard delayed-acceptance algorithm and our approximated version are similar, indicating that our approximated algorithm can return reliable Bayesian inference.
Matt Graham, Amos J. Storkey
Many generative models can be expressed as a differentiable function applied to input variables sampled from a known probability distribution. This framework includes both procedurally defined simulator models involving only differentiable operations such as based on numerical integration of ordinary and stochastic differential equation systems, and the generative component of learned parametric models currently popular in the machine learning literature such as variational autoencoders and generative adversarial networks. Though the distribution on the input variables to such models is known, often the distribution on the output variables is only implicitly defined. We present a method for performing efficient Markov chain Monte Carlo inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where approximate Bayesian computation might otherwise be employed. We use the intuition that computing conditional expectations is equivalent to integrating over a density defined on the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to move between inputs exactly consistent with observations. We validate the method by performing inference experiments in a diverse set of models. Reference: https://projecteuclid.org/euclid.ejs/1513306869
Anthony Ebert
We show how metrics on probability measures such as maximum mean discrepancy and the Wasserstein distance can be used with functional datasets arising from a dynamic queueing network (DQN) to estimate parameters using a simulated annealing approximate Bayesian computation sampler. A key challenge confronting DQN parameter estimation is simulation speed. We use a new queueing simulation method called queue departure computation to make simulation-based inference on large DQNs feasible. This is the first example of likelihood-free parameter inference for a DQN. We demonstrate the approach using real data from an international airport passenger terminal. Other possible applications include hospitals, web-servers and call-centres.
TJ McKinley
Complex epidemic models are being increasingly used to inform policy decisions regarding the control of infectious diseases, and adequately capturing key sources of uncertainty is important in order to produce robust predictions. Approximate Bayesian Computation (ABC) and other simulation-based inference methods are becoming increasingly used for inference in complex systems, due to their relative ease-of-implementation compared to alternative approaches, such as those employing data augmentation. However, despite their utility, scaling simulation-based methods to fit large-scale systems introduces a series of additional challenges that hamper robust inference. Here we use a real-world model of HIV transmission—that has been used to explore the potential impacts of potential control policies in Uganda—to illustrate some of these key challenges when applying ABC methods to high dimensional, computationally intensive models. We then discuss an alternative approach—history matching—that aims to address some of these issues, and conclude with a comparison between these different methodologies.
Jingxiong Xu, Wei Xu, Laurent Briollais
The discovery of rare genetic variants through Next Generation Sequencing (NGS) is becoming a very challenging issue in the human genetic field. We propose here a novel region-based statistical test based on an Approximate Bayesian Computation (ABC) approach to assess evidence of association between a set of rare variants located on this region and a disease outcome. Marginal likelihood is computed under the null and alternative hypotheses using Laplace approximation and the Bayes Factor (BF) for the gene- or region-based association is derived. We assume a binomial distribution for the rare variants count in the region/gene with either a beta distribution or mixture of Dirac and beta distribution for the prior distribution of rare variant probability. The hyper-parameters were determined to ensure that the null distribution of BF has asymptotically a chi-square distribution with 1 degree of freedom, which facilitates the genome-wide inference. We introduce a Bayesian False Discovery Rate (BFDR) control procedure inspired by the work of Efron (2005) and Wen (2017) to perform this genome-wide inference. The properties of this new ABC approach were assessed by simulations including the asymptotic null distribution of the BF as well as the type I error, power and BFDR of the association test. Our ABC approach has been applied to a study of lung cancer from Toronto including 262 cases and 261 controls with whole-exome sequencing data. In conclusion, the use of our novel ABC approach along with a Bayesian control of FDR offer a comprehensive and efficient computational framework to make genome-wide statistical inference that is scalable to large-scale NGS data.
Marcel Nonnenmacher, Kaan Öcal, Jakob H. Macke
Many applications in science and engineering call for Bayesian statistical inference on models which are specified through simulators, and which thus do not have explicit likelihoods. A powerful approach to statistical inference on such models originates from Regression ABC [1] : One first simulates data from the model and then uses flexible conditional density estimators (e.g. ones based on neural networks [2,3]) to approximate the posterior distribution on these simulated data.
However, this approach attempts to learn a `global' model for the posterior over parameters given data x for a wide range of simulated data x, rather than focusing on the empirically observed data (x_o). Learning such global estimators can be challenging, in particular for complex simulators with high-dimensional data and parameters.
One approach to focus the estimator on x_o is to introduce `calibration kernels' K(x,x_o) giving additional weight to simulations which yield values x closer to x_o and/or rejecting all simulations which are too far from x_o. However, this method requires choosing a kernel and a bandwidth as in Rejection ABC. An alternative is to sample parameters not from the prior, but rather from a proposal prior which is chosen such that the resulting samples x have a high probability of being similar to the empirically observed data x_o. This latter approach requires corrections for the used proposal, e.g. through introducing importance weights to the cost function [3]. The importance weights however tend to be be particularly big for simulated data which is far away from x_o, leading to increased variance of the cost function and decreased stability of the algorithms in practice.
Here we propose an approach for learning the shape of calibration kernels to yield posterior density estimation algorithms with better stability properties: We propose objective functions for learning kernel parameters with the goal of yielding a marginal distribution over simulated data x which is concentrated near x_o.
We adaptively learn the parameters of the calibration kernel K(x, x_o) in a step between sampling the synthetic dataset from the proposal and simulator and fitting the conditional density. As a training loss to learn the kernel parameters we suggest an approximation to the Kulback-Leibler divergence between the sampled distribution of data x under the proposal, and the kernel- and importance-weighted distribution of simulated data x. Small values of this loss hence imply that the kernel has learned to balance out extreme importance weights and to prevent a shift of importance away from the sampled x.
The resulting approach focuses the learning of the posterior density estimator to the empirically observed data, and also reduces the variance brought in by the use of importance weights.
In addition we show how defining the calibration kernel in the feature-space of the neural network for posterior density estimation leads to improved performance in high-dimensional estimation problems. We evaluate the performance and limitations of the proposed approach on a variety of estimation problems.
[1] Beaumont et al. (2002). Approximate Bayesian computation in population genetics. Genetics, 162(4), 2025-2035.[2] Papamakarios & Murray (2016). Fast ε-free inference of simulation models with Bayesian conditional density estimation. In Advances in Neural Information Processing Systems (pp. 1028-1036).[3] Lueckmann, Goncalves, Bassetto, Öcal, Nonnenmacher, & Macke (2017). Flexible statistical inference for mechanistic models of neural dynamics. In Advances in Neural Information Processing Systems (pp. 1289-1299).Rebecca O'Leary, Samantha Low-Choy, Daniela Vasco, Matthew Falk
A new R package, informBCT, implements Bayesian classification trees with non-informative or informative priors, and will be made available through the Comprehensive R Archive Network (CRAN). The package has many features for data analysis including: variable selection, informative priors, a Bayesian decision-theoretic layer to evaluate posterior distributions of tree performance from posteriors of tree parameters, as well as plotting the best trees, and saving trees. Informative priors can be placed on: the size of the tree, variable selection and/or the splitting rule. Moreover, informBCT can handle multi-category predictor variables, a difficulty for other tree packages—such as BART, bartMachine and tgp (Bayesian treed Gaussian process models) which all code categorical variables as dummy variables. We illustrate use of informBCT for two moderate-sized problems: predicting acute lymphoblastic leukaemia with 50 records on 55,000 genes; and confirming absence of cryptosporidium bacteria with 1,131 absences out of 1,332 records on 5 variables.
Whilst not implemented in ABC, this problem shows some relevance to the careful use of summary statistics in ABC. Here we carefully choose summary statistics, whose posterior distributions, assist in evaluation of the algorithm.
Daniela Vasco, Samantha Low-Choy, Rebecca O'Leary
Here we propose new graphical diagnostics for Bayesian Classification Trees, which will enable visualization of multiple trees from the posterior distribution. Classification Trees are a popular method for classification, especially for the easy interpretation of a single model. However, in order to improve its predictive performance and make it more robust for application in different data sets many models are combined using techniques such as Bagging and Boosting. This improvement in predictive performance comes at a cost: losing interpretability of the model. However, Bayesian Classification Trees provide interpretability whilst allowing fine-tuning of prediction performance. However, similar to data mining algorithms for Classification Trees and Random Forests, there are very few diagnostics available to consider multiple possible trees, and evaluate or compare all models, aside from considering predictive performance. We consider ways of making the morphology of single trees visible en masse, namely, the shape and location of splits relevant to variable selection and prediction. This approach will be illustrated for a published case study, using R and will show why we have called them Rain-drop plots.
Whilst we illustrate their application to output from an MCMC rather than an ABC algorithm, there is no reason why raindrop plots cannot be applied to ABC.
Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Pekka Marttinen, Michael Gutmann, Aki Vehtari, Jukka Corander, Samuel Kaski
Engine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude by surrogate-modelling the distance. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes the adding of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features. Reference: https://elfi.readthedocs.io