NIPS2014 - Out of the Box: Robustness in High Dimension


This NIPS workshop reviews various trends in robustness for high dimensional statistics, optimization and machine learning. In addition to confirmed key-note speakers, there is an open call for papers, some of which will be selected for short talks or posters.

Invited speakers

  • Emmanuel Candès "Solving quadratic equations via Wirtinger flows."
  • Alessandro Chiuso "Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood"
  • Noureddine El Karoui "On high-dimensional robust regression and inefficiency of maximum likelihood methods"
  • Donald Goldfarb "Low-rank Matrix and Tensor Recovery: Theory and Algorithms"
  • Pradeep Ravikumar "Dirty Statistical Models"
  • Po-Ling Loh "Local optima of nonconvex M-estimators"

Call for papers

We seek for submission of original research papers. Papers should be no more than 8 pages (excluding references) in NIPS style, but please include the authors' name as the review is not double-blind.

Submission of previously published work is possible as well, but the authors are required to mention this explicitly. Previously published work can be presented at the workshop, but will not be included into the workshop proceedings (which are considered peer-reviewed publications of novel contributions).  Moreover, the authors are welcome to present their novel work but choose to opt out of the workshop proceedings in case they have alternative publication plans.

Please email submissions (PDF, in NIPS style) to by Sunday, October 12 2014.


Session 1
8:30-9:15 Pradeep Ravikumar, "Dirty Statistical Models"
9:15-10:00 Donald Goldfarb, 
"Low-rank Matrix and Tensor Recovery: Theory and Algorithms"

Session 2
10:30-11:15 Alessandro Chiuso, "Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood"
11:15-12:00 Po-Ling Loh, "Local optima of nonconvex M-estimators"

Session 3 (contributed talks)
15:00-15:20 Vassilis Kalofolias, "Matrix Completion on Graphs"
15:20-15:40 Aleksandr Aravkin, "Learning sparse models using general robust losses"
15:40-16:00 Stephen Becker, "Robust Compressed Least Squares Regression"
16:00-16:20 Ahmet Iscen, "Exploiting high dimensionality for similarity search"

Session 4
17:00-17:45 Rina Foygel Barber, "Controlling the False Discovery Rate via Knockoffs" (joint with Emmanuel Candes)
18:45-18:30 Noureddine El Karoui, "On high-dimensional robust regression and inefficiency of maximum likelihood methods"


  • Aleksandr Aravkin, IBM Research (
  • Aurélie Lozano, IBM Research (
  • Stephen Becker, University of Colorado (


Speaker: Alessandro Chiuso. Co-author: Gianluigi Pillonetto
Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood"

    Kernel-based regularization approaches have been successfully applied  for solving ill-posed regression purposes.  Such estimators solve a regularized least squares problem which admits also a Bayesian interpretation where the unknown parameter is modeled as a zero-mean Gaussian vector. Typically the Gaussian prior is described by a few hyperparameters which parametrize  the covariance; these  hyperparameters  can be tuned via marginal likelihood (ML) optimization. The aim of this talk  is to provide new insights on the hyperparameter tuning.  To this purpose, we shall derive the mean squared error properties of the ML procedure for hyperparameter estimation and compare it with classical SURE-type procedures. In doing so we shall not assume the correctness of the Bayesian priors. The conclusion of our investigation is that much of criticisms reported in the literature to robustness of ML is not well founded. On the contrary, ML is a precious tool for tuning model complexity  also when undermodeling is present. 

Speaker: Noureddine El Karoui. Co-authors: (Based on several papers, including some which are joint work with Derek Bean, Peter Bickel, Chingwhay Lim and Bin Yu
"On high-dimensional robust regression and inefficiency of maximum likelihood methods"

    I will discuss the behavior of widely used statistical methods in the high-dimensional setting where the number of observations, n, and the number of predictors, p, are both large.  A number of interesting statistical phenomena happen in this setting; in particular, it is possible to improve upon maximum likelihood methods.  Mathematically, the tools needed mainly come from random matrix theory, measure concentration and convex analysis.

Speaker: Donald Goldfarb.  Co-authors: Cun Mu, Bo Huang, Tony Qin, John Wright
"Low-rank Matrix and Tensor Recovery: Theory and Algorithms"

    Recovering a low-rank matrix or tensor from incomplete or corrupted observations is a recurring problem in signal processing and machine learning. To exploit the structure of data that is intrinsically more than three-dimensional, convex models such low-rank completion and robust principal component analysis (RPCA) for matrices have been extended to tensors. In this talk, we rigorously establish recovery guarantees for both tensor completion and tensor RPCA. We demonstrate that using the most popular convex relaxation for the tensor Tucker rank can be substantially sub-optimal in terms of the number of observations needed for exact recovery. We introduce a very simple, new convex relaxation which is shown be much better, both theoretically and empirically. Moreover, we propose algorithms to solve both low-rank matrix and tensor recovery models based on the Alternating Direction Augmented Lagrangian (ADAL), Frank-Wolfe and prox-gradient methods. Finally, we empirically investigate the recoverability properties of these convex models, and the computational performance of our algorithms on both simulated and real data.

Speaker: Pradeep Ravikumar. Co-author: Eunho Yang
"Dirty Statistical Models"

    We provide a unified statistical framework for the class of what we call "superposition-structured" or "dirty" statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, any statistical model, and any of a general class of M-estimators that minimize the sum of any loss function, and an instance of a "hybrid" regularization that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures.

Speaker: Emmanuel Candès. Co-authors: Xiadong Li and Mahdi Soltanolkotabi
"Solving quadratic equations via Wirtinger flows."

    In many imaging problems such as X-ray crystallography, detectors can only record the intensity or magnitude of a diffracted wave as opposed to measuring its phase. This means that we need to solve quadratic equations — this is notoriously NP hard — as opposed to linear ones. This talk discusses novel convex and non-convex algorithms which are provably exact for solving such problems. More importantly, we also demonstrate that our noise-aware algorithms are stable in the sense that the reconstruction degrades gracefully as the signal-to-noise ratio decreases. The empirical performance shall be demonstrated on phase retrieval problems, which is at the center of spectacular current research efforts collectively known under the name of coherent diffraction imaging aimed, among other things, at determining the 3D structure of large protein complexes.

Speaker: Po-Ling Loh
Local optima of nonconvex M-estimators"

    Many popular algorithms for high-dimensional regression (e.g., Lasso-based linear regression) involve optimizing penalized M-estimators. We present several recent results concerning the behavior of local optima of such objective functions, when both the loss and penalty function are allowed to be nonconvex. At the core of these results is an assumption that the loss function satisfies a property known as restricted strong convexity, which may be shown to hold for a variety of functions that are relevant for statistical estimation with corrupted and/or contaminated data; and an assumption that the amount of nonconvexity of the penalty function is not "too" large. We give statistical guarantees for the consistency of local optima of these nonconvex M-estimators, as well as optimization guarantees for first-order algorithms that may be used to obtain near-global optima. We discuss the consequences of our results in the context of robust regression.

Speaker: Vassilis KalofoliasCo-authors: Pierre Vandergheynst, Michael Bronstein, Xavier Bresson
"Matrix Completion on Graphs"

    The problem of finding the missing values of a matrix given a few of its entries, called matrix completion, has gathered a lot of attention in the recent years. Al- though the problem under the standard low rank assumption is NP-hard, Candes and Recht showed that it can be exactly relaxed if the number of observed entries is sufficiently large. In this work, we introduce a novel matrix completion model that makes use of proximity information about rows and columns by assuming they form communities. This assumption makes sense in several real-world problems like in recommender systems, where there are communities of people sharing preferences, while products form clusters that receive similar ratings. Our main goal is thus to find a low-rank solution that is structured by the proximities of rows and columns encoded by graphs. We borrow ideas from manifold learning to constrain our solution to be smooth on these graphs, in order to implicitly force row and column proximities. Our matrix recovery model is formulated as a con- vex non-smooth optimization problem, for which a well-posed iterative scheme is provided. We study and evaluate the proposed matrix completion on synthetic and real data, showing that the proposed structured low-rank recovery model outperforms the standard matrix completion model in many situations.

Speaker: Ahmet Iscen. Co-authors: Teddy Furon, VIncent Gripon, Michael Rabbat, Herve Jegou
"Exploiting high dimensionality for similarity search"

    We design an indexing architecture to store and search in a set of high-dimensional vectors. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The membership of the query to the memory unit is tested by a simple correlation with its representative vector. This vector optimizes the membership hypothesis test when the query vectors are simple perturbations of the stored vectors.  Our approach finds the most similar database vectors with a fraction of the cost of exhaustive search without noticeably reducing the search quality. Interestingly, the gain of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario. Our evaluation is carried out on standard benchmarks for large-scale image search and uses off-the-shelf state-of-the-art descriptors. 

Speaker: Stephen Becker. Co-authors: Marek Petrik, Ban Kawaks, Karthikeyan Ramamurthy
"Robust Compressed Least Squares Regression"

    Randomized matrix compression techniques such as the Johnson-Lindenstrauss transform and sparse subspace embeddings have emerged as an effective algorithmic device with which classical numerical linear algebra tasks, such as solving least squares problems, can be performed significantly faster. Yet, aggressive dimensionality reduction introduces uncertainty and noise in the sketched problem limiting the precision in the solution one can expect with a desired speedup gain. In this paper, we propose complimentary mechanisms for improving the efficiency of matrix sketching for least squares problems. We introduce tools from robust optimization together with a form of partial compression to improve the error- time tradeoff of sketching-based compressed least squares solvers. We develop an efficient algorithm to solve the resulting robust optimization formulation. Empirical results comparing numerous alternatives suggest that aggressive randomized transforms are effectively insulated with partial compression and robustification.



This is a one-day workshop. We are planning to have a brief introductory tutorial, invited talks (35-40 min each), contributed talks (15-20 min each), a poster session, and two open discussion sessions. By keeping the number of talks contained and encouraging posters and discussion, we hope to foster a true “workshop” atmosphere.

Detailed description

The technical term “robust” was coined in 1953 by G. E. P. Box and exemplifies his adage, “all models are wrong, but some are useful”. Over the past decade, a broad range of new paradigms have appeared that allow useful inference when standard modeling assumptions are violated. Classic examples include heavy tailed formulations that mitigate the effect of outliers which would otherwise degrade the performance of Gaussian-based methods.
High-dimensional data are becoming ubiquitous in diverse domains such as genomics, neuroimaging, economics, and finance. Such data exacerbate the relevance of robustness as errors and model misspecification are prevalent in such modern applications. To extract pertinent information from large scale data, robust formulations require a comprehensive understanding of machine learning, optimization, and statistical signal processing, thereby integrating recovery guarantees, statistical and computational efficiency, algorithm design and scaling issues. For example, robust Principal Component Analysis (RPCA) can be approached using both convex and nonconvex formulations, giving rise to tradeoffs between computational efficiency and theoretical guarantees.
The goal of this workshop is to bring together machine learning, high-dimensional statistics, optimization and select large-scale applications, in order to investigate the interplay between robust modeling and computation in the large-scale setting. We incorporate several important examples that are strongly linked by this theme:
(a) Low rank matrix recovery, robust PCA, and robust dictionary learning:  High-dimensional data problems where the number of variables may greatly exceed the number of observations can be accurately solved by leveraging low-dimensional structural constraints upon the parameters to be estimated. For matrix-structured parameters, low-rank recovery is a prime example of such low-dimensional assumption. To efficiently recover the low-rank structure characterizing the data, Robust PCA extends classical PCA in order to accommodate grossly corrupted observations that have become ubiquitous in modern applications. Sparse coding and dictionary learning build upon the fact that many real-world data can be represented as a sparse linear combination of basis vectors over an over-complete dictionary and aims at learning such an efficient representation of the data. Sparse coding and dictionary learning are being used in a variety of tasks including image denoising and inpainting, texture synthesis, image classification and unusual event detection.
(b) Robust inference for large scale inverse problems and machine learning:  Many data commonly encountered are heavy-tailed where the Gaussian assumption does not apply. The issue of robustness has been largely overlooked in the high-dimensional learning literature, yet this aspect is critical when dealing with high dimensional noisy data. Traditional likelihood-based estimators (including Lasso and Group Lasso) are known to lack resilience to outliers and model misspecification. Despite this fact, there has been limited focus on robust learning methods in high-dimensional modeling.
(c) Non-convex formulations: heavy tails, factorized matrix inversion, nonlinear forward models. Combining robustness with statistical efficiency requires non-convexity of the loss function. Surprisingly, it is often possible to show that either certain non-convex problems have exact convex relaxations, or that algorithms directly solving non-convex problems may produce points that are statistically indistinguishable from the global optimum.

(d) Robust optimization: avoiding overfitting on precise but unreliable parameters. This classic topic has become increasingly relevant as researchers purposefully perturb problems. This perturbation comes in many forms: from “sketching” functions with Johnson-Lindenstrauss-like transformations, using randomized algorithms to speed up linear algebra, using randomized coordinate descent, and/or stochastic gradient algorithms. Recently the techniques of robust optimization have been applied to these situations.

It is the aim of this workshop to bring together researchers from statistics, machine learning, optimization, and applications, in order to focus on a comprehensive understanding of robust modeling and computation. In particular, we will see challenges of implementing robust formulations in the large-scale and nonconvex setting, as well as examples of success in these areas. 

The workshop follows in the footsteps if the “Robust ML” workshop at NIPS in 2010.  The field is very active and there have been significant advances in the past 4 years. We also expect to have new topics, such as new applications of robust optimization to user-perturbed problems and Markov Decision Processes.