Séance du 15 janvier 2018

Séance organisée par Erwan Le Pennec et Joseph Salmon.

Lieu : IHP, Amphi Hermite

14.00 : Julie Josse (Ecole Polytechnique)

Titre : Distributed Multi-Level Matrix Completion for Medical Databases

Résumé : Gathering the information contained in the databases of several hospitals is a step toward personalized medical care as it increases the chances of finding similar patient profiles and therefore provinding them better treatment. However, there are technical (computations and storage issues) and social barriers (privacy concerns) to the aggregation of medical data. Both obstacles can be overcome by turning to distributed computations so that hospitals only share some intermediate results instead of the raw data. As it is often the case, the medical databases are incomplete. One aim of the project is to impute the data of one hospital using the data of the other hospitals. This could also be an incentive to encourage the hospitals to participate in the project. In this talk, we will describe a single imputation method for multi-level (hierarchical) data that can be used to impute both quantitative, categorical and mixed data. This method is based on multi-level simultaneous component analysis (MLSCA) which basically decomposes the variability in both a between and within (hospitals) variability and performs a SVD on each part. The imputation method can be seen as an extension of matrix completion methods. The methods and their distributed versions are implemented in an R package.

15.00 : Umut Şimşekli (Télécom ParisTech)

Titre : Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for MCMC

Résumé : Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the light-tailed nature of the Gaussian proposals. In this talk, I will present an extension to the classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavy-tailed distributions, called alpha-stable Lévy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. I will also present novel computational methods that can scale up to large-scale problems and provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multi-modal settings, improved convergence rates, and robustness to algorithm parameters. arXiv: https://arxiv.org/abs/1706.03649

16.00 : Arnak Dalalyan (ENSAE - CREST)

Titre : Convex programming approach to robust estimation of a multivariate Gaussian model

Résumé : Multivariate Gaussian is often used as a first approximation to the distribution of high-dimensional data. Determining the parameters of this distribution under various constraints is a widely studied problem in statistics, and is often considered as a prototype for testing new algorithms or theoretical frameworks. In this paper, we develop a nonasymptotic approach to the problem of estimating the parameters of a multivariate Gaussian distribution when data are corrupted by outliers. We propose an estimator---efficiently computable by solving a convex program---that robustly estimates the population mean and the population covariance matrix even when the sample contains a significant proportion of outliers. Our estimator of the corruption matrix is provably rate optimal simultaneously for the entry-wise ℓ1-norm, the Frobenius norm and the mixed ℓ2/ℓ1 norm. Furthermore, this optimality is achieved by a penalized square-root-of-least-squares method with a universal tuning parameter (calibrating the strength of the penalization). These results are partly extended to the case where p is potentially larger than n, under the additional condition that the inverse covariance matrix is sparse.​ Joint work with Samuel Balmand.