High-Dimensional Statistics

The last two decades we have witnessed an explosion of data resulting from technological advances that have allowed the collection and processing of very large data sets. In particular, high-dimensional data are nowadays commonly encountered in modern applications accross the sciences and engineering. The field of high-dimensional statistics provides a theoretical framework for understanding the possibilities and limits of statistical inference when the number of unknown parameters is larger than the sample size. In this short course we will give an overview of the challenges, successes and limitations of this young but very rich area of research. A tentative list of the topics that will be discussed is given below.

Day 1 (Introduction to high-dimensional statistics) Slides1

Motivation : the challenge of high-dimensional data and and the bet on sparsity
Review of some useful probability tools
Linear regression with orthogonal design
Covariance matrix estimation

Day 2 (A general theory for sparse M-estimation) Slides2

Sparsity inducing regularization
A general theory
Examples: Linear regression, GLM, matrix regression
Robust estimation in high-dimensions

Day 3 (Practical considerations, statistical inference and further extensions) Slides3

Computation of lasso type estimators: optimization and tuning parameters
Statistical inference for low dimensional parameters
Further extensions: computational-statistical trade-offs, new asymptotic paradigms, differential privacy.

Main references

T. Hastie, R. Tibshirani and M. Wainwright (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC

M. Wainwright (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press.

P. Bühlmann and S. Van de Geer (2011). Statistics for High-Dimensional Data: methods, theory and applications. Springer.