## SummaryThis NIPS workshop reviews various trends in robustness for high dimensional statistics, optimization and machine learning. In addition to confirmed key-note speakers, there is an open call for papers, some of which will be selected for short talks or posters.
## Invited speakers- Emmanuel Candès "Solving quadratic equations via Wirtinger flows."
- Alessandro Chiuso "Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood"
- Noureddine El Karoui "On high-dimensional robust regression and inefficiency of maximum likelihood methods"
- Donald Goldfarb "Low-rank Matrix and Tensor Recovery: Theory and Algorithms"
- Pradeep Ravikumar "Dirty Statistical Models"
- Po-Ling Loh "Local optima of nonconvex M-estimators"
## Call for papersWe seek for submission of original research papers. Papers should be no more than 8 pages (excluding references) in NIPS style, but please include the authors' name as the review is not double-blind.
Submission of previously published work is possible as well, but the authors are required to mention this explicitly. Previously published work can be presented at the workshop, but will not be included into the workshop proceedings (which are considered peer-reviewed publications of novel contributions). Moreover, the authors are welcome to present their novel work but choose to opt out of the workshop proceedings in case they have alternative publication plans.
Please email submissions (PDF, in NIPS style) to stephen.becker@colorado.edu by
Sunday, October 12 2014.## ScheduleSession 1 ======== 8:30-9:15 Pradeep Ravikumar, "Dirty Statistical Models" 9:15-10:00 Donald Goldfarb, "Low-rank Matrix and Tensor Recovery: Theory and Algorithms" Session 2 ======== 10:30-11:15 Alessandro Chiuso, "Robustness issues in Kernel Tuning: SURE vs. Marginal Likelihood" 11:15-12:00 Po-Ling Loh, "Local optima of nonconvex M-estimators" Session 3 (contributed talks) ======== 15:00-15:20 Vassilis Kalofolias, "Matrix Completion on Graphs" 15:20-15:40 Aleksandr Aravkin, "Learning sparse models using general robust losses" 15:40-16:00 Stephen Becker, "Robust Compressed Least Squares Regression" 16:00-16:20 Ahmet Iscen, "Exploiting high dimensionality for similarity search" Session 4 ======== 17:00-17:45 Rina Foygel Barber, "Controlling the False Discovery Rate via Knockoffs" (joint with Emmanuel Candes) 18:45-18:30 Noureddine El Karoui, "On high-dimensional robust regression and inefficiency of maximum likelihood methods"
## Organizers- Aleksandr Aravkin, IBM Research (saravkin@us.ibm.com)
- Aurélie Lozano, IBM Research (aclozano@us.ibm.com)
- Stephen Becker, University of Colorado (stephen.becker@colorado.edu)
Speaker: Kernel-based regularization approaches have been successfully applied for solving ill-posed regression purposes. Such estimators solve a regularized least squares problem which admits also a Bayesian interpretation where the unknown parameter is modeled as a zero-mean Gaussian vector. Typically the Gaussian prior is described by a few hyperparameters which parametrize the covariance; these hyperparameters can be tuned via marginal likelihood (ML) optimization. The aim of this talk is to provide new insights on the hyperparameter tuning. To this purpose, we shall derive the mean squared error properties of the ML procedure for hyperparameter estimation and compare it with classical SURE-type procedures. In doing so we shall not assume the correctness of the Bayesian priors. The conclusion of our investigation is that much of criticisms reported in the literature to robustness of ML is not well founded. On the contrary, ML is a precious tool for tuning model complexity also when undermodeling is present. Speaker: I will discuss the behavior of widely used statistical methods in the high-dimensional setting where the number of observations, n, and the number of predictors, p, are both large. A number of interesting statistical phenomena happen in this setting; in particular, it is possible to improve upon maximum likelihood methods. Mathematically, the tools needed mainly come from random matrix theory, measure concentration and convex analysis. Speaker: Recovering a low-rank matrix or tensor from incomplete or corrupted observations is a recurring problem in signal processing and machine learning. To exploit the structure of data that is intrinsically more than three-dimensional, convex models such low-rank completion and robust principal component analysis (RPCA) for matrices have been extended to tensors. In this talk, we rigorously establish recovery guarantees for both tensor completion and tensor RPCA. We demonstrate that using the most popular convex relaxation for the tensor Tucker rank can be substantially sub-optimal in terms of the number of observations needed for exact recovery. We introduce a very simple, new convex relaxation which is shown be much better, both theoretically and empirically. Moreover, we propose algorithms to solve both low-rank matrix and tensor recovery models based on the Alternating Direction Augmented Lagrangian (ADAL), Frank-Wolfe and prox-gradient methods. Finally, we empirically investigate the recoverability properties of these convex models, and the computational performance of our algorithms on both simulated and real data. Speaker: We provide a unified statistical framework for the class of what we call "superposition-structured" or "dirty" statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, any statistical model, and any of a general class of M-estimators that minimize the sum of any loss function, and an instance of a "hybrid" regularization that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. Speaker: In many imaging problems such as X-ray crystallography, detectors can only record the intensity or magnitude of a diffracted wave as opposed to measuring its phase. This means that we need to solve quadratic equations — this is notoriously NP hard — as opposed to linear ones. This talk discusses novel convex and non-convex algorithms which are provably exact for solving such problems. More importantly, we also demonstrate that our noise-aware algorithms are stable in the sense that the reconstruction degrades gracefully as the signal-to-noise ratio decreases. The empirical performance shall be demonstrated on phase retrieval problems, which is at the center of spectacular current research efforts collectively known under the name of coherent diffraction imaging aimed, among other things, at determining the 3D structure of large protein complexes.
Speaker: Many popular algorithms for high-dimensional regression (e.g., Lasso-based linear regression) involve optimizing penalized M-estimators. We present several recent results concerning the behavior of local optima of such objective functions, when both the loss and penalty function are allowed to be nonconvex. At the core of these results is an assumption that the loss function satisfies a property known as restricted strong convexity, which may be shown to hold for a variety of functions that are relevant for statistical estimation with corrupted and/or contaminated data; and an assumption that the amount of nonconvexity of the penalty function is not "too" large. We give statistical guarantees for the consistency of local optima of these nonconvex M-estimators, as well as optimization guarantees for first-order algorithms that may be used to obtain near-global optima. We discuss the consequences of our results in the context of robust regression.
Speaker: The problem of finding the missing values of a matrix given a few of its entries, called matrix completion, has gathered a lot of attention in the recent years. Al- though the problem under the standard low rank assumption is NP-hard, Candes and Recht showed that it can be exactly relaxed if the number of observed entries is sufficiently large. In this work, we introduce a novel matrix completion model that makes use of proximity information about rows and columns by assuming they form communities. This assumption makes sense in several real-world problems like in recommender systems, where there are communities of people sharing preferences, while products form clusters that receive similar ratings. Our main goal is thus to find a low-rank solution that is structured by the proximities of rows and columns encoded by graphs. We borrow ideas from manifold learning to constrain our solution to be smooth on these graphs, in order to implicitly force row and column proximities. Our matrix recovery model is formulated as a con- vex non-smooth optimization problem, for which a well-posed iterative scheme is provided. We study and evaluate the proposed matrix completion on synthetic and real data, showing that the proposed structured low-rank recovery model outperforms the standard matrix completion model in many situations.
Speaker: We design an indexing architecture to store and search in a set of high-dimensional vectors. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The membership of the query to the memory unit is tested by a simple correlation with its representative vector. This vector optimizes the membership hypothesis test when the query vectors are simple perturbations of the stored vectors. Our approach finds the most similar database vectors with a fraction of the cost of exhaustive search without noticeably reducing the search quality. Interestingly, the gain of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario. Our evaluation is carried out on standard benchmarks for large-scale image search and uses off-the-shelf state-of-the-art descriptors. Speaker: Randomized matrix compression techniques such as the Johnson-Lindenstrauss transform and sparse subspace embeddings have emerged as an effective algorithmic device with which classical numerical linear algebra tasks, such as solving least squares problems, can be performed significantly faster. Yet, aggressive dimensionality reduction introduces uncertainty and noise in the sketched problem limiting the precision in the solution one can expect with a desired speedup gain. In this paper, we propose complimentary mechanisms for improving the efficiency of matrix sketching for least squares problems. We introduce tools from robust optimization together with a form of partial compression to improve the error- time tradeoff of sketching-based compressed least squares solvers. We develop an efficient algorithm to solve the resulting robust optimization formulation. Empirical results comparing numerous alternatives suggest that aggressive randomized transforms are effectively insulated with partial compression and robustification. ## Links## FormatThis is a one-day workshop. We are planning to have a brief introductory tutorial, invited talks (35-40 min each), contributed talks (15-20 min each), a poster session, and two open discussion sessions. By keeping the number of talks contained and encouraging posters and discussion, we hope to foster a true “workshop” atmosphere. ## Detailed descriptionThe technical term “robust” was coined in 1953 by G. E. P. Box and exemplifies his adage, “all models are wrong, but some are useful”. Over the past decade, a broad range of new paradigms have appeared that allow useful inference when standard modeling assumptions are violated. Classic examples include heavy tailed formulations that mitigate the effect of outliers which would otherwise degrade the performance of Gaussian-based methods. High-dimensional data are becoming ubiquitous in diverse domains such as genomics, neuroimaging, economics, and finance. Such data exacerbate the relevance of robustness as errors and model misspecification are prevalent in such modern applications. To extract pertinent information from large scale data, robust formulations require a comprehensive understanding of machine learning, optimization, and statistical signal processing, thereby integrating recovery guarantees, statistical and computational efficiency, algorithm design and scaling issues. For example, robust Principal Component Analysis (RPCA) can be approached using both convex and nonconvex formulations, giving rise to tradeoffs between computational efficiency and theoretical guarantees. The goal of this workshop is to bring together machine learning, high-dimensional statistics, optimization and select large-scale applications, in order to investigate the interplay between robust modeling and computation in the large-scale setting. We incorporate several important examples that are strongly linked by this theme: (a) |