# All About that Bayes

# A monthly seminar

All About that Bayes (formerly Bayes in Paris) is a monthly seminar (every second Tuesday of the month at 2 p.m.) on Bayesian statistics organised by Pierre Gloaguen (AgroParisTech), Sylvain Le Corff (LPSM, Sorbonne Université) and Julien Stoehr (Université Paris-Dauphine). It focuses on the most recent Bayesian solutions to challenging learning problems that can arise from a wide range of applications such as ecology, population genetics, signal processing, energy optimisation, just to name a few.

If you want to get details on upcoming talks, sign in to the newsletter here!

# 2023

### May 09, 2023. 14:00, INRAE - BioSP (Avignon)

Meïli Baragtti (ENSAE) - Promenade en statistique bayésienne: une méthode d'élicitation, une méthode sans vraisemblance et un exemple simple dans un cadre de modèle épidémiologique de transmission de maladie

Webpage: http://www.meilibaragatti.fr

### March 21, 2023. 14:00, AgroParisTech (22 place de l'Agronomie, 91123 Palaiseau), Amphi. A.0.04 (rez de chaussée du bâtiment d'accueil)

Francesca Crucinio (ENSAE) - Optimal Scaling Results for a Wide Class of Proximal MALA Algorithms

We consider a recently proposed class of MCMC methods which uses proximity maps instead of gradients to build proposal mechanisms which can be employed for both differentiable and non-differentiable targets. These methods have been shown to be stable for a wide class of targets, making them a valuable alternative to Metropolis-adjusted Langevin algorithms (MALA); and have found wide application in imaging contexts. The wider stability properties are obtained by building the Moreau-Yoshida envelope for the target of interest, which depends on a parameter $\lambda$. In this work, we investigate the optimal scaling problem for this class of algorithms, which encompasses MALA, and provide practical guidelines for the implementation of these methods.

Joint work with Alain Durmus, Pablo Jiménez, Gareth O. Roberts.

### February 14, 2023. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15.16-309

Adrian Raftery (University of Washington) - Very Long-Term Bayesian Global Population and Migration Projections for Assessing the Social Cost of Carbon

Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, to assess changes, and to make decisions involving risks. In a major breakthrough, since 2015 the United Nations has issued probabilistic population forecasts for all countries using a Bayesian methodology. Assessment of the social cost of carbon relies on long-term forecasts of carbon emissions, which in turn rely on even longer-range population and economic forecasts, to 2300. We extend the UN method to very-long range population forecasts, by combining the statistical approach with expert review and elicitation. We find that, while world population is projected to grow for most of the rest of this century, it is likely to stabilize in the 22nd century, and to decline in the 23rd century.

### January 10, 2023. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15.16-309

Daniele Durante (Bocconi University) - Detective Bayes: Bayesian nonparametric stochastic block modeling of criminal networks

Europol recently defined criminal networks as a modern version of the Hydra mythological creature, with covert structure and multifaceted evolutions. Indeed, relationships data among criminals are subject to measurement errors, structured missingness patterns, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may encode key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of community detection algorithms routinely-used in criminology, thereby leading to overly-simplified and possibly biased reconstructions of organized crime topologies. In this seminar, I will present a number of model-based solutions which aim at covering these gaps via a combination of stochastic block models and priors for random partitions arising from Bayesian nonparametrics. These include Gibbs-type priors, and random partition priors driven by the urn scheme of a hierarchical normalized completely random measure. Product-partition models to incorporate criminals' attributes, and zero-inflated Poisson representations accounting for weighted edges and secrecy strategies, will be also discussed. Collapsed Gibbs samplers for posterior computation are presented, and refined strategies for estimation, prediction, uncertainty quantification and model selection will be outlined. Results are illustrated in an application to an Italian Mafia network, where the proposed models unveil a structure of the criminal organization mostly hidden to state-of-the-art alternatives routinely used in criminology. I will conclude the seminar with ideas on how to learn the evolutionary history of the criminal organization from the relationship data among its criminals via a novel combination of latent space models for network data and phylogenetic trees.

# 2022

### December 13, 2022. 14:30, Campus Pierre et Marie Curie (Sorbonne Université), Room 16.26-113

Marylou Gabrié (Ecole Polytechnique) - Opportunities and Challenges in Enhancing Sampling with Learning

Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. Virtually, these models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such a Bayesian posterior, is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will review recent works trying to enhance traditional inference and sampling algorithms with learning. I will present in particular flowMC, an adaptive MCMC with Normalizing Flow along with first applications and remaining challenges.

Webpage: https://marylou-gabrie.github.io/

### November 8, 2022. 14:00, INRIA Grenoble (Mirror Session).

Filippo Ascolani (Bocconi University) - Clustering consistency with Dirichlet process mixtures

Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

Joint work with Antonio Lijoi, Giovanni Rebaudo, and Giacomo Zanelli.

Reference: https://arxiv.org/abs/2205.12924 (Biometrika, forthcoming)

Webpage: https://filippoascolani.github.io/

### October 11, 2022. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15-16-201.

Andrew Gelman (Columbia University) - Prior distribution for causal inference

In Bayesian inference, we must specify a model for the data (a likelihood) and a model for parameters (a prior). Consider two questions:

Why is it more complicated to specify the likelihood than the prior?

In order to specify the prior, how could can we switch between the theoretical literature (invariance, normality assumption, ...) and the applied literature (experts elicitation, robustness, ...)?

I will discuss those question in the domain of causal inference: prior distributions for causal effects, coefficients of regression and the other parameters in causal models.

### March 15, 2022. 16:00, AgroParisTech, Amphitheatre Dumont

Alexandre Bouchard-Côté (University of British Columbia) - Approximation of intractable integrals using non-reversibility and non-linear distribution paths

In the first part of the talk, I will present an adaptive, non-reversible Parallel Tempering (PT) allowing MCMC exploration of challenging problems such as single cell phylogenetic trees. A sharp divide emerges in the behaviour and performance of reversible versus non-reversible PT schemes: the performance of the former eventually collapses as the number of parallel cores used increases whereas non-reversible benefits from arbitrarily many available parallel cores. These theoretical results are exploited to develop an adaptive scheme to efficiently optimize over annealing schedules.

In the second half, I will talk about the global communication barrier, a fundamental limit shared by both reversible and non-reversible PT methods, and on our recent work that leverage non-linear annealing paths to provably and practically break that barrier.

My group is also interested in making these advanced non-reversible Monte Carlo methods easily available to data scientists. To do so, we have designed a Bayesian modelling language to perform inference over arbitrary data types using non-reversible, highly parallel algorithms.

References:

Non-Reversible Parallel Tempering: a Scalable Highly Parallel MCMC Scheme (2021). S. Syed, A. Bouchard-Côté, G. Deligiannidis, A. Doucet. Journal of Royal Statistical Society, Series B. https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12464

Parallel Tempering on Optimized Paths (2021). S. Syed, V. Romaniello, T. Campbell, A. Bouchard-Côté. International Conference on Machine Learning (ICML). http://proceedings.mlr.press/v139/syed21a/syed21a.pdf

Software: Blang: Probabilitistic Programming for Combinatorial Spaces. A. Bouchard-Côté, K. Chern, D. Cubranic, S. Hosseini, J. Hume, M. Lepur, Z. Ouyang, G. Sgarbi. Journal of Statistical Software (Accepted). https://arxiv.org/abs/1912.10396, https://www.stat.ubc.ca/~bouchard/blang/

Webpage: https://www.stat.ubc.ca/~bouchard/index.html

### February 22, 2022. 16:00, AgroParisTech, Amphitheatre Coléou

Arnaud Guyader (LPSM, Sorbonne Université) - On the Asymptotic Normality of Adaptive Multilevel Splitting

Adaptive Multilevel Splitting (AMS) is a Sequential Monte Carlo method for Markov processes that simulates rare events and estimates associated probabilities. Despite its practical efficiency, there are almost no theoretical results on the convergence of this algorithm. The purpose of this talk is to prove both consistency and asymptotic normality results in a general setting. This is done by associating to the original Markov process a level-indexed process, also called a stochastic wave, and by showing that AMS can then be seen as a Fleming-Viot type particle system. This is a joint work with Frédéric Cérou, Bernard Delyon, and Mathias Rousset.

Webpage: https://www.lpsm.paris/pageperso/guyader/index.html

# 2021

### March 16th, 2021. 16:00, Building D'Alembert, room Condorcet, ENS Paris-Saclay

Estelle Kuhn (INRAE, Unité MaIAGE) - Properties of the stochastic approximation EM algorithm with mini-batch sampling

To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation– Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.

Reference: Journal version, ArXiv version

Webpage: http://genome.jouy.inra.fr/~ekuhn/

# 2020

### March 13th, 2020. 13:30, Building D'Alembert, room Condorcet, ENS Paris-Saclay

Julyan Arbel (INRIA Grenoble) - Understanding Priors in Bayesian Neural Networks at the Unit Level

We investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. Bayesian neural networks with Gaussian priors are well known to induce an L2, “weight decay”, regularization. Our results characterize a more intricate regularization effect at the level of the unit activations. Our main result establishes that the induced prior distribution on the units before and after activation becomes increasingly heavy-tailed with the depth of the layer. We show that first layer units are Gaussian, second layer units are sub-exponential, and units in deeper layers are characterized by sub-Weibull distributions. Our results provide new theoretical insight on deep Bayesian neural networks, which we corroborate with simulation experiments.

Webpage: https://www.julyanarbel.com/

### February 26th, 2020. 16:00, room 32

Pierre E. Jacob (Harvard University) - Unbiased MCMC with couplings

MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal; first, it stands at odds with current trends in computing hardware, with increasingly parallel architectures; secondly, the choice of "burn-in" or "warm-up" is arduous. This talk will describe recently proposed estimators that are unbiased for the expectations of interest while having a finite computing cost and a finite variance. They can thus be generated independently in parallel and averaged over. The method also provides practical upper bounds on the distance (e.g. total variation) between the marginal distribution of the chain at a finite step and its invariant distribution. The key idea is to generate "faithful" couplings of Markov chains, whereby pairs of chains coalesce after a random number of iterations. This talk will provide an overview of this line of research.

Reference: https://arxiv.org/abs/1708.03625. Code in R available at: https://github.com/pierrejacob/unbiasedmcmc.

### January 21st, 2020. 15:00, room 42

Scott Sisson (UNSW) - Approximate posteriors and data for Bayesian inference

For various reasons, including large datasets and complex models, approximate inference is becoming increasingly common. In this talk I'll provide three vignettes of recent work. These cover

approximate Bayesian computation for Gaussian process density estimation

likelihood-free Gibbs sampling

MCMC for approximate (rounded) data.

# 2019

### November 12th, 2019. 15:00, room 39C (Aile Arbalete, 2e floor)

François Portier (Télécom Paris) - On adaptive importance sampling: theory and methods

Adaptive importance sampling (AIS) uses past samples to update the sampling policy qt at each stage t. Each stage t is formed with two steps:

to explore the space with nt points according to qt ;

to exploit the current amount of information to update the sampling policy.

In this talk, I will present different AIS methods and show that they are optimal in some sense.

### October 15, 2019. 15:30, room 39C (Aile Arbalete, 2e floor)

Grégoire Clarté - Component-wise approximate Bayesian computation via Gibbs-like steps

Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.

Reference: arxiv.org/abs/1905.13599