# All About that Bayes

# A monthly seminar

All About that Bayes (formerly Bayes in Paris) is a monthly seminar on Bayesian statistics organised by Pierre Gloaguen (AgroParisTech), Sylvain Le Corff (LPSM, Sorbonne Université) and Julien Stoehr (Université Paris-Dauphine) for the specialised group at SFdS. It focuses on the most recent Bayesian solutions to challenging learning problems that can arise from a wide range of applications such as ecology, population genetics, signal processing, energy optimisation, just to name a few.

If you want to get details on upcoming talks, sign in to the newsletter here!

Access to SCAI: https://iscd.sorbonne-universite.fr/about/contact/

# 2024

### April 24, 2024. 16:00, PariSanté Campus, Room 8

Guanyang Wang (Rutgers University) - MCMC when you do not want to evaluate the target distribution

Abstract: In sampling tasks, it is common for target distributions to be known up to a normalizing constant. However, in many situations, evaluating even the unnormalized distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for large datasets and the 'doubly intractable' distributions. We provide a way to unify various MCMC algorithms, including several minibatch MCMC algorithms and the exchange algorithm. This framework not only simplifies the theoretical analysis of existing algorithms but also creates new algorithms. Similar frameworks exist in the literature, but they concentrate on different objectives.

### March 27, 2024. 14:00, Université Paris Dauphine, Salle B bis

François Caron (University of Oxford) - Deep Neural Networks with Dependent Weights: Gaussian Process Mixture Limit, Heavy Tails, Sparsity and Compressibility

Abstract: This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a Lévy measure on the positive reals. If the scalar parameters are strictly positive and the Lévy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the Lévy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and some nodes have asymptotically non-negligible contributions, therefore representing important hidden features. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.

### February 13, 2024. 16:00, Campus Pierre et Marie Curie (Sorbonne Université), SCAI

Elisabeth Gassiat (Université Paris-Saclay) - A stroll through hidden Markov models

Abstract: Hidden Markov models are latent variables models producing dependent sequences. I will survey recent results providing guarantees for their use in various fields such as clustering, multiple testing, nonlinear ICA or variational autoencoders.

### January 23, 2024. 16:00, PariSanté Campus

Ritabrata Dutta (University of Warwick) - Bayesian Model Averaging with exact inference of likelihood- free Scoring Rule Posteriors.

Abstract: A novel application of Bayesian Model Averaging to generative models parameterized with neural networks (GNN) characterized by intractable likelihoods is presented. We leverage a likelihood-free generalized Bayesian inference approach with Scoring Rules. To tackle the challenge of model selection in neural networks, we adopt a continuous shrinkage prior, specifically the horseshoe prior. We introduce an innovative blocked sampling scheme, offering compatibility with both the Boomerang Sampler (a type of piecewise deterministic Markov process sampler) for exact but slower inference and with Stochastic Gradient Langevin Dynamics (SGLD) for faster yet biased posterior inference. This approach serves as a versatile tool bridging the gap between intractable likelihoods and robust Bayesian model selection within the generative modelling framework.

# 2023

### December 12, 2023. 16:00, Campus Pierre et Marie Curie (Sorbonne Université), salle 15-16 201

Sylvain Le Corff (Sorbonne Université) - Monte Carlo guided Diffusion for Bayesian linear inverse problems

Joint work with G. Cardoso, Y. Janati, E. Moulines.

Abstract: Ill-posed linear inverse problems that combine knowledge of the forward measurement model with prior models arise frequently in various applications, from computational photography to medical imaging. Recent research has focused on solving these problems with score-based generative models (SGMs) that produce perceptually plausible images, especially in inpainting problems. In this study, we exploit the particular structure of the prior defined in the SGM to formulate recovery in a Bayesian framework as a Feynman--Kac model adapted from the forward diffusion model used to construct score-based diffusion. To solve this Feynman--Kac problem, we propose the use of Sequential Monte Carlo methods. The proposed algorithm, MCGdiff, is shown to be theoretically grounded and we provide numerical simulations showing that it outperforms competing baselines when dealing with ill-posed inverse problems.

### October 10, 2023. 16:00, Campus Pierre et Marie Curie (Sorbonne Université), SCAI

Kaniav Kamary (Centrale Supélec) - Bayesian principal component analysis

The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this talk, I’ll first present probabilistic reformulation that is the basis for a Bayesian treatment of PCA. Then, my focus will be on showing that the effective dimensionality of the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure.

### May 09, 2023. 14:00, INRAE - BioSP (Avignon)

Meïli Baragtti (ENSAE) - Promenade en statistique bayésienne: une méthode d'élicitation, une méthode sans vraisemblance et un exemple simple dans un cadre de modèle épidémiologique de transmission de maladie

Webpage: http://www.meilibaragatti.fr

### March 21, 2023. 14:00, AgroParisTech (22 place de l'Agronomie, 91123 Palaiseau), Amphi. A.0.04 (rez de chaussée du bâtiment d'accueil)

Francesca Crucinio (ENSAE) - Optimal Scaling Results for a Wide Class of Proximal MALA Algorithms

We consider a recently proposed class of MCMC methods which uses proximity maps instead of gradients to build proposal mechanisms which can be employed for both differentiable and non-differentiable targets. These methods have been shown to be stable for a wide class of targets, making them a valuable alternative to Metropolis-adjusted Langevin algorithms (MALA); and have found wide application in imaging contexts. The wider stability properties are obtained by building the Moreau-Yoshida envelope for the target of interest, which depends on a parameter $\lambda$. In this work, we investigate the optimal scaling problem for this class of algorithms, which encompasses MALA, and provide practical guidelines for the implementation of these methods.

Joint work with Alain Durmus, Pablo Jiménez, Gareth O. Roberts.

### February 14, 2023. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15.16-309

Adrian Raftery (University of Washington) - Very Long-Term Bayesian Global Population and Migration Projections for Assessing the Social Cost of Carbon

Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, to assess changes, and to make decisions involving risks. In a major breakthrough, since 2015 the United Nations has issued probabilistic population forecasts for all countries using a Bayesian methodology. Assessment of the social cost of carbon relies on long-term forecasts of carbon emissions, which in turn rely on even longer-range population and economic forecasts, to 2300. We extend the UN method to very-long range population forecasts, by combining the statistical approach with expert review and elicitation. We find that, while world population is projected to grow for most of the rest of this century, it is likely to stabilize in the 22nd century, and to decline in the 23rd century.

### January 10, 2023. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15.16-309

Daniele Durante (Bocconi University) - Detective Bayes: Bayesian nonparametric stochastic block modeling of criminal networks

Europol recently defined criminal networks as a modern version of the Hydra mythological creature, with covert structure and multifaceted evolutions. Indeed, relationships data among criminals are subject to measurement errors, structured missingness patterns, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may encode key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of community detection algorithms routinely-used in criminology, thereby leading to overly-simplified and possibly biased reconstructions of organized crime topologies. In this seminar, I will present a number of model-based solutions which aim at covering these gaps via a combination of stochastic block models and priors for random partitions arising from Bayesian nonparametrics. These include Gibbs-type priors, and random partition priors driven by the urn scheme of a hierarchical normalized completely random measure. Product-partition models to incorporate criminals' attributes, and zero-inflated Poisson representations accounting for weighted edges and secrecy strategies, will be also discussed. Collapsed Gibbs samplers for posterior computation are presented, and refined strategies for estimation, prediction, uncertainty quantification and model selection will be outlined. Results are illustrated in an application to an Italian Mafia network, where the proposed models unveil a structure of the criminal organization mostly hidden to state-of-the-art alternatives routinely used in criminology. I will conclude the seminar with ideas on how to learn the evolutionary history of the criminal organization from the relationship data among its criminals via a novel combination of latent space models for network data and phylogenetic trees.

# 2022

### December 13, 2022. 14:30, Campus Pierre et Marie Curie (Sorbonne Université), Room 16.26-113

Marylou Gabrié (Ecole Polytechnique) - Opportunities and Challenges in Enhancing Sampling with Learning

Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. Virtually, these models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such a Bayesian posterior, is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will review recent works trying to enhance traditional inference and sampling algorithms with learning. I will present in particular flowMC, an adaptive MCMC with Normalizing Flow along with first applications and remaining challenges.

Webpage: https://marylou-gabrie.github.io/

### November 8, 2022. 14:00, INRIA Grenoble (Mirror Session).

Filippo Ascolani (Bocconi University) - Clustering consistency with Dirichlet process mixtures

Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

Joint work with Antonio Lijoi, Giovanni Rebaudo, and Giacomo Zanelli.

Reference: https://arxiv.org/abs/2205.12924 (Biometrika, forthcoming)

Webpage: https://filippoascolani.github.io/

### October 11, 2022. 14:00, Campus Pierre et Marie Curie (Sorbonne Université), Room 15-16-201.

Andrew Gelman (Columbia University) - Prior distribution for causal inference

In Bayesian inference, we must specify a model for the data (a likelihood) and a model for parameters (a prior). Consider two questions:

Why is it more complicated to specify the likelihood than the prior?

In order to specify the prior, how could can we switch between the theoretical literature (invariance, normality assumption, ...) and the applied literature (experts elicitation, robustness, ...)?

I will discuss those question in the domain of causal inference: prior distributions for causal effects, coefficients of regression and the other parameters in causal models.

### March 15, 2022. 16:00, AgroParisTech, Amphitheatre Dumont

Alexandre Bouchard-Côté (University of British Columbia) - Approximation of intractable integrals using non-reversibility and non-linear distribution paths

In the first part of the talk, I will present an adaptive, non-reversible Parallel Tempering (PT) allowing MCMC exploration of challenging problems such as single cell phylogenetic trees. A sharp divide emerges in the behaviour and performance of reversible versus non-reversible PT schemes: the performance of the former eventually collapses as the number of parallel cores used increases whereas non-reversible benefits from arbitrarily many available parallel cores. These theoretical results are exploited to develop an adaptive scheme to efficiently optimize over annealing schedules.

In the second half, I will talk about the global communication barrier, a fundamental limit shared by both reversible and non-reversible PT methods, and on our recent work that leverage non-linear annealing paths to provably and practically break that barrier.

My group is also interested in making these advanced non-reversible Monte Carlo methods easily available to data scientists. To do so, we have designed a Bayesian modelling language to perform inference over arbitrary data types using non-reversible, highly parallel algorithms.

References:

Non-Reversible Parallel Tempering: a Scalable Highly Parallel MCMC Scheme (2021). S. Syed, A. Bouchard-Côté, G. Deligiannidis, A. Doucet. Journal of Royal Statistical Society, Series B. https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12464

Parallel Tempering on Optimized Paths (2021). S. Syed, V. Romaniello, T. Campbell, A. Bouchard-Côté. International Conference on Machine Learning (ICML). http://proceedings.mlr.press/v139/syed21a/syed21a.pdf

Software: Blang: Probabilitistic Programming for Combinatorial Spaces. A. Bouchard-Côté, K. Chern, D. Cubranic, S. Hosseini, J. Hume, M. Lepur, Z. Ouyang, G. Sgarbi. Journal of Statistical Software (Accepted). https://arxiv.org/abs/1912.10396, https://www.stat.ubc.ca/~bouchard/blang/

Webpage: https://www.stat.ubc.ca/~bouchard/index.html

### February 22, 2022. 16:00, AgroParisTech, Amphitheatre Coléou

Arnaud Guyader (LPSM, Sorbonne Université) - On the Asymptotic Normality of Adaptive Multilevel Splitting

Adaptive Multilevel Splitting (AMS) is a Sequential Monte Carlo method for Markov processes that simulates rare events and estimates associated probabilities. Despite its practical efficiency, there are almost no theoretical results on the convergence of this algorithm. The purpose of this talk is to prove both consistency and asymptotic normality results in a general setting. This is done by associating to the original Markov process a level-indexed process, also called a stochastic wave, and by showing that AMS can then be seen as a Fleming-Viot type particle system. This is a joint work with Frédéric Cérou, Bernard Delyon, and Mathias Rousset.

Webpage: https://www.lpsm.paris/pageperso/guyader/index.html

# 2021

### March 16th, 2021. 16:00, Building D'Alembert, room Condorcet, ENS Paris-Saclay

Estelle Kuhn (INRAE, Unité MaIAGE) - Properties of the stochastic approximation EM algorithm with mini-batch sampling

To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation– Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.

Reference: Journal version, ArXiv version

Webpage: http://genome.jouy.inra.fr/~ekuhn/

# 2020

### March 13th, 2020. 13:30, Building D'Alembert, room Condorcet, ENS Paris-Saclay

Julyan Arbel (INRIA Grenoble) - Understanding Priors in Bayesian Neural Networks at the Unit Level

We investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. Bayesian neural networks with Gaussian priors are well known to induce an L2, “weight decay”, regularization. Our results characterize a more intricate regularization effect at the level of the unit activations. Our main result establishes that the induced prior distribution on the units before and after activation becomes increasingly heavy-tailed with the depth of the layer. We show that first layer units are Gaussian, second layer units are sub-exponential, and units in deeper layers are characterized by sub-Weibull distributions. Our results provide new theoretical insight on deep Bayesian neural networks, which we corroborate with simulation experiments.

Webpage: https://www.julyanarbel.com/

### February 26th, 2020. 16:00, room 32

Pierre E. Jacob (Harvard University) - Unbiased MCMC with couplings

MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal; first, it stands at odds with current trends in computing hardware, with increasingly parallel architectures; secondly, the choice of "burn-in" or "warm-up" is arduous. This talk will describe recently proposed estimators that are unbiased for the expectations of interest while having a finite computing cost and a finite variance. They can thus be generated independently in parallel and averaged over. The method also provides practical upper bounds on the distance (e.g. total variation) between the marginal distribution of the chain at a finite step and its invariant distribution. The key idea is to generate "faithful" couplings of Markov chains, whereby pairs of chains coalesce after a random number of iterations. This talk will provide an overview of this line of research.

Reference: https://arxiv.org/abs/1708.03625. Code in R available at: https://github.com/pierrejacob/unbiasedmcmc.

### January 21st, 2020. 15:00, room 42

Scott Sisson (UNSW) - Approximate posteriors and data for Bayesian inference

For various reasons, including large datasets and complex models, approximate inference is becoming increasingly common. In this talk I'll provide three vignettes of recent work. These cover

approximate Bayesian computation for Gaussian process density estimation

likelihood-free Gibbs sampling

MCMC for approximate (rounded) data.

# 2019

### November 12th, 2019. 15:00, room 39C (Aile Arbalete, 2e floor)

François Portier (Télécom Paris) - On adaptive importance sampling: theory and methods

Adaptive importance sampling (AIS) uses past samples to update the sampling policy qt at each stage t. Each stage t is formed with two steps:

to explore the space with nt points according to qt ;

to exploit the current amount of information to update the sampling policy.

In this talk, I will present different AIS methods and show that they are optimal in some sense.

### October 15, 2019. 15:30, room 39C (Aile Arbalete, 2e floor)

Grégoire Clarté - Component-wise approximate Bayesian computation via Gibbs-like steps

Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.

Reference: arxiv.org/abs/1905.13599