High-Dimensional Statistics

The last two decades we have witnessed an explosion of data resulting from technological advances that have allowed the collection and processing of very large data sets. In particular, high-dimensional data are nowadays commonly encountered in modern applications accross the sciences and engineering. The field of high-dimensional statistics provides a theoretical framework for understanding the possibilities and limits of statistical inference when the number of unknown parameters is larger than the sample size. In this short course we will give an overview of the challenges, successes and limitations of this young but very rich area of research. A tentative list of the topics that will be discussed is given below.


Day 1 (Introduction to high-dimensional statistics) Slides1

  • Motivation : the challenge of high-dimensional data and and the bet on sparsity
  • Review of some useful probability tools
  • Linear regression with orthogonal design
  • Covariance matrix estimation

Day 2 (A general theory for sparse M-estimation) Slides2

  • Sparsity inducing regularization
  • A general theory
  • Examples: Linear regression, GLM, matrix regression
  • Robust estimation in high-dimensions

Day 3 (Practical considerations, statistical inference and further extensions) Slides3

  • Computation of lasso type estimators: optimization and tuning parameters
  • Statistical inference for low dimensional parameters
  • Further extensions: computational-statistical trade-offs, new asymptotic paradigms, differential privacy.

Main references

T. Hastie, R. Tibshirani and M. Wainwright (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC

M. Wainwright (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press.

P. Bühlmann and S. Van de Geer (2011). Statistics for High-Dimensional Data: methods, theory and applications. Springer.