Sparse modeling is a rapidly developing area on the intersection of statistics, machine-learning and signal processing, motivated by the age-old statistical problem of variable selection in high-dimensional datasets. Selection (and, moreover, construction) of a small set of highly predictive variables is central to many applications where the ultimate objective is to enhance our understanding of underlying physical, biological and other natural processes, beyond just building accurate 'black-box' predictors.
Recent years have witnessed a flurry of research on algorithms and theory for sparse modeling, mainly focused on l1-regularized optimization, a convex relaxation of the (NP-hard) smallest subset selection problem. Examples include sparse regression, such as Lasso and its various extensions (Elastic Net, fused Lasso, group Lasso, simultaneous/multi-task Lasso, adaptive Lasso, bootstrap Lasso, etc.), sparse graphical model selection, sparse dimensionality reduction (sparse PCA, CCA, NMF, etc.) and learning dictionaries that allow sparse representations. Applications of these methods are wide-ranging, including computational biology, neuroscience, image processing, stock-market prediction and social network analysis, as well as compressed sensing, an extremely fast-growing area of signal processing.
However, is the promise of sparse modeling realized in practice? It turns out that, despite the significant advances in the field, a number of open issues remain when sparse modeling meets real-life applications. Below we only mention a few of them:
- Interpretability and stability
Interpretability is the traditional motivation for sparse modeling. But in practice interpretability does not equal sparsity. For instance, in practical applications an interpretable model must be also stable (reproducible). If two sparse models inferred from similar datasets (e.g. two runs of an fMRI experiment for the same subject performing same task) have low structural overlap, one can hardy provide any meaningful interpretation regardless how sparse the models is (indeed, recent applications of sparse regression to fMRI data raise this concern). Practical issues that remain to be explored include, among others, sensitivity of sparse models to different types of noise in the data, as well as their ability to handle potential existence of multiple nearly-optimal solutions (e.g. due to lack of a single sparse ground truth),: e.g., localizing brain areas highly predictive of a mental state does not exclude the possibility of finding other areas with similar property - the situation that indeed occurs when predicting some types of tasks/mental states that apparently involve highly distributed brain activation patterns, but does not occur for tasks that appear to be more ''localized". Thus, important issues to explore include search for multiple nearly-optimal sparse solutions rather than single solution, (2) practical approaches to characterizations of problems (e.g. mental states) that allow for sparse modeling (given that theoretical properties such as RIP or irrepresentability are hard to test in practice), and (3) methods that increase the stability of sparse modeling algorithms.
- ``Right'' Representation
Most sparse modeling algorithms assume that the feature space allowing for sparse representation is given explicitly. They are also quite restrictive on the type of interactions allowed among such features (e.g. additive interactions in linear models). However, in practice, the sparsity-allowing feature space and/or the type of interactions are unknown. Can we design sparse modeling algorithms that are capable of uncovering the "correct" feature space? Can we relax some of the limitations on the types of feature interactions allowed by the model? Dictionary learning (learning a sparse representation while learning a proper "dictionary" of features) is a very interesting new approach to these issues. However, current approaches to dictionary learning are linear as they typically focused on matrix factorizations; can they be extended to more complex dictionaries while preserving computational efficiency?
- Structured sparsity
In practical applications, the sparsity pattern often follows a particular structure, which can be further exploited. Recently, inserting additional structure has been achieved, for instance, by developing various extensions of l1-regularization that allowed to model group-level sparsity (l1/lq penalties: group lasso, multi-task lasso), enforce grouping of correlated variables when the group structure is not given (Elastic Net), enforce smoothness among nearby variables given a natural variable ordering (fused lasso), or learn sparse interaction networks (e.g., MRFs). What other types of structured sparsity can arise in practical applications, and how can it be leveraged by efficient sparse modeling algorithms?
Probably the most important hurdle that one faces in practice is how to evaluate the performance of sparse algorithms in a way that goes beyond predictive performance. In practice, the ultimate objective of sparse modeling is not just to predict well, but rather to infer a new knowledge about the domain (e.g. infer brain areas and/or interaction pattern among them relevant to a particular cognitive task). Unfortunately, it is difficult to evalaute the structure-reconstruction quality of a sparse-modeling approach in applications where the ground-truth is unknown. On the other hand, such evaluation is necessary (1) in order to convince domain experts that your approach is indeed discovering meaningful model (and domain experts tend to care less about the test-set log-likelihood or predictive accuracy, which are indeed just ``sanity-checks'' any reasonable model must pass); (2) in order to guide the selection of good hyper-parameters and sparse modeling algorithms; (3) in order to have an objective measure of the quality of new sparse modeling algorithm by performing relevant comparison to previous work.
We would like to invite researchers working on methodology, theory and especially applications of sparse modeling to share their experiences and insights into both the basic properties of the methods, and the properties of the application domains that make particular methods more (or less) suitable. Moreover, we plan to have a brainstorming session on various open issues, including (but not limited to) the ones mentioned above, and hope to come up with a set of new research directions motivated by problems encountered in practical applications.
This is a one-day workshop. We are planning to have a brief introductory tutorial, 3 invited talks (35-40min each) and several contributed talks (15-20min each), poster session, and two open discussion sessions.
Eric Xing (CMU) - sparse networks in computational biology
Francis Bach (INRIA) - dictionary learning, structured sparsity
Stephen Strother (Toronto U.) - stability and reproducibility in fMRI analysis
Saharon Rosset (Tel Aviv U.) - realistic sparse modeling in genetics and other domains
Past related workshops:
NIPS 2009 Manifolds, sparsity, and structured models: When can low-dimensional geometry really help?
ICML 2008 workshop on Sparse Optimization and Variable Selection
NIPS 2006 workshop on Causality and Feature Selection
NIPS 2003 workshop on feature extraction and feature selection challenge
NIPS 2001 workshop on Variable and Feature Selection
Information on the book based on the workshop.