Séance du 13 janvier 2025

Séance organisée par Judith Rousseau et Anne Sabourin

Lieu : IHP, amphi Yvonne Choquet-Bruhat (second étage du bâtiment Perrin)

14.00 : Marie Perrot-Dockès (MAP5 - Université de Paris)

Titre : Easily Computed Marginal Likelihoods from Posterior Simulation Using the THAMES Estimator

Résumé : We propose an easily computed estimator of the marginal likelihood from posterior simulation output, via reciprocal importance sampling, combining earlier proposals of DiCiccio et al (1997) and Robert and Wraith (2009). This involves only the unnormalized posterior densities from the sampled parameter values, and does not involve additional simulations beyond the main posterior simulation, or additional complicated calculations, provided that the parameter space is unconstrained. Even if this is not the case, the estimator is easily adjusted by a simple Monte Carlo approximation. It is unbiased for the reciprocal of the marginal likelihood, consistent, has finite variance, and is asymptotically normal. It involves one user-specified control parameter, and we derive an optimal way of specifying this. We illustrate it with several numerical examples.

15.00 : Eddie Aamari (DMA, ENS Paris)

Titre : A theory of stratification learning

Résumé : Given i.i.d. sample from a stratified mixture of immersed manifolds of different dimensions, we study the minimax estimation of the underlying stratified structure. We provide a constructive algorithm allowing to estimate each mixture component at its optimal dimension-specific rate adaptively. The method is based on an ascending hierarchical co-detection of points belonging to different layers, which also identifies the number of layers and their dimensions, assigns each data point to a layer accurately, and estimates tangent spaces optimally. These results hold regardless of any ambient assumption on the manifolds or on their intersection configurations. They open the way to a broad clustering framework, where each mixture component models a cluster emanating from a specific nonlinear correlation phenomenon.

16.00 : Gloria Buriticá (AgroParisTech, Université Paris-Saclay)

Titre : Progression: an extrapolation principle for regression

Résumé : Non-parametric and machine learning regression methods are popular because they can fit complex data during training; however, they are only reliable if the test points are within the range of training data. The problem of regression extrapolation, or out-of-distribution generalization, arises when predictions are required at test points outside the range of the training data. In such cases, the non-parametric guarantees for regression methods from both statistics and machine learning typically fail. Based on the theory of tail dependence, we propose a novel statistical extrapolation principle. After a suitable, data-adaptive marginal transformation, our principle assumes the relationship between predictors and the response simplifies at the boundary of the training predictor samples. This assumption holds for a wide range of models, including the additive noise models for a broad family of non-parametric regression functions. Our semi-parametric method: progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range. We demonstrate how this principle can be effectively integrated with existing approaches, such as random forests and additive models, to improve extrapolation performance on out-of-distribution samples (joint work with Sebastian Engelke, https://arxiv.org/pdf/2410.23246).

Page updated

Google Sites

Report abuse