Abstracts

Morning, September 29th


  • 9.00-9.45 Ismaël Castillo Introduction, BASICS project and Bayesian multiple testing


I will give a brief overview of results obtained in the BASICS project, on the topics of: Bayes for high-dimensional models and multiple testing, mathematical guarantees for some popular learning algorithms (e.g. using trees and forests), uncertainty quantification and multiscale results. I will particularly focus on Bayesian multiple testing and review some open problems in the area.



  • 9.45-10.30 Kweku Abraham Sharp multiple testing boundaries for sparse sequences


This work investigates multiple testing from the point of view of minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from 0 by at least some amount, we determine the sharp minimax testing risk asymptotically and thereby explicitly describe the transition from "achievable multiple testing with vanishing risk" to "impossible multiple testing". Adaptive multiple testing procedures achieving the corresponding optimal boundary are provided: the Benjamini--Hochberg procedure with properly tuned parameter, and an empirical Bayes -value ('local FDR') procedure. We prove that the FDR and FNR have non-symmetric contributions to the testing risk for most procedures, the FNR part being dominant at the boundary. The optimal multiple testing boundary is then investigated for classes of arbitrary sparse signals.


  • 10.50-11.20 Ariane Marandon False clustering rate control in mixture models


The clustering task consists in delivering labels to the members of a sample. For most data sets, some individuals are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous. To overcome this difficulty, the idea followed here is to classify only a part of the sample in order to obtain a small misclassification rate. This approach is well known in the supervised setting, and referred to as classification with an abstention option. The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework. The problem is formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items. New procedures are introduced and their behavior is shown to be close to the optimal one by establishing theoretical results and conducting numerical experiments.



  • 11.20-12.05 Etienne Roquain Machine learning meets false discovery rate


Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees, while they often lack of flexibility. On the other hand, recent machine learning classification algorithms, as those based on random forest (RF) or neural networks (NN), have great practical performances but lack of interpretation and of theoretical guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al (2021). AdaDetect is shown to both control strongly the FDR and to have a power that mimics the one of the oracle in a specific sense. The interest and validity of our approach is demonstrated with theoretical results, numerical experiments on several benchmark datasets and with an application to astrophysical data. It is in particular shown that, while AdaDetect can be used in combination with any classifier, it is particularly efficient when combined with RF and NN methods. Overall, this work is at the crossroad of multiple testing, conformal prediction and machine learning.




Afternoon, September 29th


  • 13.30-14.15 Kolyan Ray Bayesian inference for multi-dimensional diffusions


We consider the problem of estimating the drift of a multidimensional diffusion, a model used for instance in molecular dynamics. We study theoretical properties of Bayesian nonparametric procedures based on Gaussian process priors, including converge rates for the posterior distribution and maximum a posteriori (MAP) estimates.


  • 14.35-15.05 Tabea Rebafka Powerful multiple testing of paired null hypotheses using a latent graph model


We consider the multiple testing problem of paired null hypotheses, for which the data are collected on pairs of entities and tests have to be performed for each pair. Typically, for each pair (i, j), we observe some interaction/association score between i and j and the aim is to detect the pairs with a significant score. In this setting, it is natural to assume that the true/false null constellation is structured according to an unobserved graph, where present edges correspond to a significant association score. That is, the latent graph structures the dependencies among null hypotheses. In line with the seminal work of Sun and Cai, we propose a multiple testing procedure that learns the graph structure by using a stochastic block model. Under appropriate assumptions, the new procedure is shown to control the false discovery rate, up to remainder terms. The procedure is also shown to be nearly optimal in the sense that it is close to the procedure maximizing the true discovery rate. Numerical experiments reveal that our method outperforms state-of-the-art methods and is robust to model misspecification.  


This is joint work with Etienne Roquain and Fanny Villers.



  • 15.05-15.35 Bo Ning Bayesian Multiscale analysis of the Cox model


We study the joint Bernstein-von Mises phenomenon of the Cox model with classes of piecewise constant priors. We first derive contraction rate results for the unknown hazard function. Then, using recently developed multiscale techniques, we derive functional limiting results for the conditional cumulative hazard and survival functions. Frequentist coverage properties of Bayesian credible sets are investigated: we prove that certain easily computable credible bands for the survival function are optimal frequentist confidence bands. Simulation studies are conducted that confirm these predictions have excellent behavior particularly in finite samples.




Morning, September 30th


  • 09.15-10.00 Aad van der Vaart Some results on the Pitman-Yor prior


The Pitman-Yor prior is a generalization of the Dirichlet process prior, parametrized by an extra "type" parameter, which can give "more discrete partitions" of the data than the Dirichlet prior. We investigate the quality of the corresponding posterior distribution for recovering the distribution of a sample and we investigate the estimation of the type parameter.


Joint work with Stefan Franssen



  • 10.00-10.45 Judith Rousseau On multivariate deconvolution

In this work I will discuss some results on posterior concentration rates for the deconvolution problem.

The deconvolution problem consists in observations Y_i = X_i +E_i,1 i ≤ n, where E_i is the noise and has known disitribution F and X_i is the signal whose distribution F_X is unknown. We study Bayesian nonparametric methods for such models and provide simple conditions to derive concentration of the posterior distribution in terms of the Wasserstein distance for the unknown distribution of the signal F_X. To do so we first derive an inversion inequality, which allows to compare some distances in the direct problem with the Wasserstein distance in the indirect problem and secondly we provide posterior concentration rates in the direct problem based on location mixtures of Gaussian distributions.

This inversion inequality is valid both in the univariate and in the multivariate case.


Joint work with Catia Scricciolo (Univ. Verona)



  • 11.15-12.00 Veronika Rockova Adversarial Bayesian Simulation


In the absence of explicit or tractable likelihoods, Bayesians often resort to approximate Bayesian computation (ABC) for inference. Our work bridges ABC with deep neural implicit samplers based on generative adversarial networks (GANs) and adversarial variational Bayes. Both ABC and GANs compare aspects of observed and fake data to simulate from posteriors and likelihoods, respectively. We develop a Bayesian GAN (B-GAN) sampler that directly targets the posterior by solving an adversarial optimization problem. B-GAN is driven by a deterministic mapping learned on the ABC reference by conditional GANs. Once the mapping has been trained, iid posterior samples are obtained by filtering noise at a negligible additional cost. We propose two post-processing local refinements using (1) data-driven proposals with importance reweighing, and (2) variational Bayes. We support our findings with frequentist-Bayesian results, showing that the typical total variation distance between the true and approximate posteriors converges to zero for certain neural network generators and discriminators. Our findings on simulated data show highly competitive performance relative to some of the most recent likelihood-free posterior simulators.


Joint work with Yuexi Wang


  • 12.00-12.45 Johannes Schmidt-Hieber On the inability of Gaussian process regression to optimally learn compositional functions

We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size n.

Joint work with Matteo Giordano and Kolyan Ray



Afternoon, September 30th


  • 14.00-14.45 Sylvain Le Corff Disentangling Identifiable Features with Nonlinear ICA and Structured VAEs

We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. We introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states. We perform learning and inference using Structured VAEs (Johnson et al., 2016) – the current state-of-art in variational inference for structured models which enables principled estimation and inference by variational maximum likelihood.

https://proceedings.neurips.cc/paper/2021/file/0cdbb4e65815fbaf79689b15482e7575-Paper.pdf


  • 15.05-15.50 Badr-Eddine Chérief-Abdellatif PAC-Bayes guarantees for VAEs

Despite its wide use and empirical successes, the theoretical understanding and study of the behaviour and performance of the variational autoencoder (VAE) have only emerged in the past few years. We contribute to this recent line of work by analysing the VAE's reconstruction ability for unseen test data, leveraging arguments from the PAC-Bayes theory. We provide generalisation bounds on the theoretical reconstruction error, and provide insights on the regularisation effect of VAE objectives. We illustrate our theoretical results with supporting experiments on classical benchmark datasets.