COST  :  NIPS 2011 Workshop on
Computational Trade-offs in Statistical Learning

Sierra Nevada, Spain

Dec 16, 2011 Montebajo Basketball Court
Overview Schedule Keynote Speakers Submission Info Organizers
 7.30-7:40 Opening Remarks   
 7:40-8:40 Keynote: Stochastic Algorithms for One-Pass Learning  Leon Bottou
 8:40-9:00 Coffee Break and Poster Session  
 9:00-9:30 Early stopping for non-parametric regression: An optimal data-dependent stopping rule  Garvesh Raskutti
 9:30-10:00 Statistical and computational tradeoffs in biclustering  Sivaraman Balakrishnan
 10:00-10:10 Contributed short talk: Inducing Value Sparsity for Parallel Inference in Tree-shaped Models  Sameer Singh
 10:10-10:20 Contributed short talk: Learning Accuracy/Speed Tradeoffs in Nondeterministic Inference Algorithms      Hal Daume ́ III
 10:20-10:30 Contributed short talk: Learning Cost-Aware, Loss-Aware Approximate Inference Policies for Probabilistic Graphical Models  Veselin Stoyanov
 10:30-16:00 Ski Break  
 16:00-17:00 Keynote: Using More Data to Speed-up Training Time  Shai Shalev-Shwartz
 17:00-17:30    Coffee break and Poster Session       
 17:30-18:00 Theoretical Basis for ``More Data Less Work''?  Nati Srebro
 18:00-18:15 Discussion  
 18:15-18:45 Anticoncentration regularizers for stochastic combinatorial problems  Shiva Kaul
 18:45-18:55 Contributed short talk: Online Rank Aggregation  Shota Yasutake
18:55-19:05 Contributed short talk: Approximate Reduction from AUC Maximization to 1-norm Soft Margin Optimization  Daiki Suehiro
 19:05-20:00 Last chance to look at posters   


    Since its early days, the field of Machine Learning has focused on developing computationally tractable algorithms with good learning guarantees. The vast literature on statistical learning theory has led to a good understanding of how the predictive performance of different algorithms improves as a function of the number of training samples. By the same token, the well-developed theories of optimization and sampling methods have yielded efficient computational techniques at the core of most modern learning methods. The separate developements in these fields mean that given an algorithm we have a sound understanding of its statistical and computational beahvior. However, there hasn't been much joint study of the computational and statistical complexities of learinng, as a consequence of which, little is known about the interaction and trade-offs between statistical accuracy and computational complexity. Indeed a systematic joint treatment can answer some very interesting questions: what is the best attainable statistical error given a finite computational budget? What is the best learning method to use given different computational constraints and desired statistical yardsticks? Is it the case that simple methods outperform complex ones in computationally impoverished scenarios? 

    At its core, the PAC framework aims to study learning through the lens of computation. However, the thrust is on separating polynomial-time from computationally intractable algorithms. However all polynomial-time computations are hardly equivalent, and the difference between linear vs quadratic dependence on problem parameters can have a profound effect on the applicability of an algorithm. Understanding the trade-offs between statistical accuracy and computational demands in this situation is of paramount importance. 

    The need for such a theory is more compelling now than ever before since we routinely face training corpuses with billions of examples, and often, an even larger number of parameters to be estimated. The emergence of web and mechanical turk as sources of training data often stretches learning algorithms to the point that the bottleneck is no longer the number of examples, but the amount of computation available to process the examples. A theory to principally choose from a multitude of learning methods based on the properties of training examples as well as the computational resources available would be of clear interest. Another way to pose the same problem would be to design algorithms that can take as input a computational constraint and try to learn the best hypothesis they can based on the available budget and data. 

    There have been some works that try to address different facets of the above problem. Researchers working on massive datasets in the CS theory community look at streaming methods that aim to impose constraints on both the computational and storage requirements of the algorithms. Online learning presents one particular way of dealing with a computational budget, by processing as many samples as possible with the computational budget. 

    There have been some more relevant works in the machine learning community in the last few years. Bottou and Bousquet (2008) compare the amount of computation needed to attain a certain statistical error for a few routinely used optimization algorithms. Shalev-Shwartz and Srebro (2009) show how stochastic gradient descent applied to SVM optimization can experience an inverse dependence on number of training sample in the regime of large datasets. In some more recent works, Shalev-Shwartz and co-authors have also used cryptographic conjectures to establish the computational hardness of certain learning problems. On the algorithmic front, coarse-to-fine learning provides a nice framework to systematically incorporate computational considerations, using computational cost as a regularization term in the objective of the learning method. Other budgeted algorithms such as budgeted SVMs and budgeted perceptrons try to admit hard budget constraints on the running time and storage of the algorithm. 

    The goals of our workshop are: 
    • To draw the attention of machine learning researchers to this rich and emerging area of problems and to establish a community of researchers that are interested in understanding these tradeoffs.
    • To define a number of common problems in this area and to encourage future research that is comparable and compatible.
    • To expose the learning community to relevant work in fields such as CS theory and convex optimization.

We would like to welcome high-quality submissions on topics including but not limited to: 
  • Fundamental statistical limits with bounded computation
  • Trade-offs between statistical accuracy and computational costs
  • Computation-preserving reductions between statistical problems
  • Algorithms to learn under budget constraints
  • Budget constraints on other resources (e.g. bounded memory)
  • Computationally aware approaches such as coarse-to-fine learning 
 Interesting submissions in other relevant topics not listed above are welcome too. Due to the time constraints, most accepted submissions will be presented as poster spotlights.

Keynote Speakers


Program Committee

  • Léon Bottou
  • Olivier Chapelle
  • John Duchi
  • Claudio Gentile
  • John Langford
  • Maxim Raginsky
  • Pradeep Ravikumar
  • Ohad Shamir
  • Karthik Sridharan
  • David Weiss
  • Nati Srebro