Séance du 15 avril 2019

Séance organisée par Cécile Durot et Etienne Roquain

Lieu : IHP, Amphi Darboux

14.00 : Arnak Dalalyan (ENSAE - CREST)

Titre : Robust Sparse Regression by Convex Programming

Résumé : Motivated by the construction of tractable robust estimators via convex relaxations, we present conditions on the sample size which guarantee an augmented restricted eigenvalue condition for Gaussian designs. Such a condition is suitable for high-dimensional inference in a linear model and in a multivariate Gaussian model when samples are corrupted by outliers (either in the response variable or in the design matrix). Our analysis leads to significantly sharper restricted eigenvalue constant, valid under weaker assumptions than those available in the literature. In particular, we require no condition relating the sparsity of the unknown parameter and the number of outliers. Elaborating on these results, we establish new risk bounds for the l1-penalized Huber M-estimator for robust linear regression. These bounds show that the rate of estimation of s-sparse p-dimensional parameter vectors is of order (s/n) + (o/n)^2, up to logarithmic factors, where o is the number of outliers and n is the sample size.

(Joint work with Philip Thompson)​

15.00 : Emilie Lebarbier (AgroParisTech - INRA)

Titre : Segmentation of time-series with dependence

Résumé : The objective of segmentation methods is to detect abrupt changes, called breakpoints, in

the distribution of a signal. Such segmentation problems arise in many areas, as in biology, in climatology, in geodesy, .... The inference of segmentation models requires to search over the space of all possible segmentations, which is prohibitive in terms of computational time, when performed in a naive way. The Dynamic Programming (DP) strategy is the only one that retrieves the exact solution in a fast way but only applies when the contrast (e.g. the log-likelihood) to be optimized is additive with respect to the segments. However, this is not the case in presence of some dependencies. We consider two cases:

(i) When dealing with time-series, it is likely that time-dependence exists.

(ii) When dealing with multiple series, it is likely that some dependence between series exists (as spatial correlation).

Our goal is to propose an efficient maximum likelihood inference procedure. For both our strategy consists in removing the dependence such that DP can be applied during the inference procedure.

(Joint work with S. Chakar, X. Collilieux, C. Lévy-Leduc and S. Robin)

16.00 : Mélanie Zetlaoui (Nanterre)

Titre : Nonnegative Matrix Factorization: a (Semi-)Parametric Statistical View

Résumé : It is the purpose of this talk to investigate the Nonnegative Matrix Factorization (NMF) task from a statistical perspective. Stated in geometrical terms, the goal of NMF consists in finding a convex cone in the positive orthant, with dimension inferior to the space, in order to "representing accurately" a cloud of multivariate data. Whereas the majority of the literature dedicated to NMF focused on algorithmic issues related to the computation of representations maximizing some goodness-of-fit criterion, statistical grounds for such M-estimation techniques have not been exhibited yet. Here, we formulate NMF as a latent variable model. In a semi-parametric context, we compute the different tangent spaces as well as the efficient score function and propose a Z-estimator with estimated nuisance parameters based on the efficient score. Under appropriate assumptions, this Z-estimator yields asymptotically normal estimates of the latent cone. In a parametric context, model selection issues related to the dimension of the underlying cone are also considered through the AIC and BIC approaches.

(Joint work with Patrice Bertail and Stéphan Clémençon)