Resource-Efficient Machine Learning

NIPS-2013 Workshop

Tuesday, December 10, 2013

Lake Tahoe, Nevada, United States

Room: Harvey's Emerald Bay 3

Abstract

Resource efficiency is key for making ideas practical. It is crucial in many tasks, ranging from large-scale learning ("big data'') to small-scale mobile devices. Understanding resource efficiency is also important for understanding biological systems, from individual cells to complex learning systems, such as the human brain. The goal of this workshop is to improve our fundamental theoretical understanding and link between various applications of learning under constraints on the resources, such as computation, observations, communication, and memory. While the founding fathers of machine learning were mainly concerned with characterizing the sample complexity of learning (the observations resource) [VC74] it now gets realized that fundamental understanding of other resource requirements, such as computation, communication, and memory is equally important for further progress [BB11].

The problem of resource-efficient learning is multidimensional and we already see some parts of this puzzle being assembled. One question is the interplay between the requirements on different resources. Can we use more of one resource to save on a different resource? For example, the dependence between computation and observations requirements was studied in [SSS08,SSST12,SSB12,CJ13]. Another question is online learning under various budget constraints [AKKS12,BKS13,CBG06,CKS04,DSSS05,MCJ13,SBCA13]. For instance, Badanidiyuru et al. study online learning with limited feedback and limited supply (one example is dynamic pricing with limited supply, where we have a limited number of items to sell and on each successful sale transaction we lose one item). A related question of online learning under constraints on information acquisition was studied in [SBCA13]. The problem of resource efficiency gets even more acute when multiple devices compete on shared resources, especially in lack of centralized control [KCLB13]. These challenging problems require new tools in learning and optimization [PB13,Bal13].

Study of resource-efficient learning also require design of resource-dependent performance measures. In the past, algorithms were compared in terms of predictive accuracy (classification errors, AUC, F-measures, NDCG, etc.), yet there is a need to evaluate them with additional metrics related to resources, such as memory, CPU time, and even power. For example, reward per computation operation.

We believe that there are deep connections between problems at various scales and with various resource constraints and there are basic principles of learning under resource constraints and new optimization algorithms that are yet to be discovered. We invite researchers to share their practical challenges and theoretical insights on this problem.

References:

[AKKS12] Kareem Amin, Michael Kearns, Peter Key and Anton Schwaighofer. Budget Optimization for Sponsored Search: Censored Learning in MDPs. UAI 2012.

[Bal13] Grey Ballard. Avoiding communication in dense linear algebra. Ph.D. thesis. 2013.

[BB11] Leon Bottou and Olivier Bousquet. The trade-offs of large scale learning. In Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright, editors, Optimization for Machine Learning. MIT Press, 2011.

[BKS13] Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins. Bandits with Knapsacks. FOCS, 2013.

[BS12] Sebastien Bubeck and Aleksandrs Slivkins. The best of both worlds: stochastic and adversarial bandits. COLT, 2012.

[CBG06] Nicolò Cesa-Bianchi and Claudio Gentile. Tracking the best hyperplane with a simple budget perceptron. COLT 2006.

[CJ13] Venkat Chandrasekaran and Michael I. Jordan. Computational and statistical tradeoffs via convex relaxation. Proceedings of the National Academy of Sciences, 110(13), 2013.

[CKS04] Koby Crammer, Jaz Kandola and Yoram Singer. Online Classification on a Budget. NIPS 2003.

[DSSS05] Ofer Dekel, Shai Shalev-shwartz and Yoram Singer. The Forgetron: A kernel-based perceptron on a fixed budget. NIPS 2004.

[KCLB13] Matt Kraning, Eric Chu, Javad Lavaei, and Stephen Boyd. Dynamic Network Energy Management via Proximal Message Passing. Foundations and Trends in Optimization, 1(2). 2013.

[MCJ13] Ioannis Mitliagkas, Constantine Caramanis, and Prateek Jain. Streaming, Memory-limited PCA. NIPS 2013.

[PB13] Neal Parikh and Stephen Boyd. Block splitting for distributed optimization. Mathematical Programming Computation. 2013.

[SAL+11] Yevgeny Seldin, Peter Auer, François Laviolette, John Shawe-Taylor, and Ronald Ortner. PAC-Bayesian Analysis of Contextual Bandits. NIPS, 2011.

[SBCA13] Yevgeny Seldin, Peter Bartlett, Koby Crammer, and Yasin Abbasi-Yadkori. Prediction with Limited Advice and Multiarmed Bandits with Paid Observations. 2013.

[SSB12] Shai Shalev-Shwartz and Aharon Birnbaum. Learning halfspaces with the zero-one loss: Time-accuracy trade-offs. NIPS, 2012.

[SSS08] Shai Shalev-Shwartz and Nathan Srebro. SVM Optimization: Inverse Dependence on Training Set Size. ICML, 2008.

[SSST12] Shai Shalev-Shwartz, Ohad Shamir, and Eran Tromer. Using more data to speed-up training time. AISTATS, 2012.

[VC74] Vladimir N. Vapnik and Alexey Ya. Chervonenkis. Theory of pattern recognition. Nauka, Moscow (in Russian), 1974. German translation: W.N.Wapnik, A.Ya.Tschervonenkis (1979), Theorie der Zeichenerkennug, Akademia, Berlin.

Call for Sponsors

Your logo could be here.... If you are interested to sponsor this event, please, contact yevgeny.seldin at gmail.

Call for Contributions

We invite submission of abstracts and open problems to the workshop. Abstracts and open problems should be at most 4 pages long in the NIPS format. Appendices are allowed, but the organizers reserve the right to evaluate the submissions based on the first 4 pages only. Submissions should be NOT anonymous. Selected abstracts and open problems will be presented as talks or posters during the workshop. Contributions should be emailed to yevgeny.seldin at gmail.

IMPORTANT DATES

Submission Deadline: October 23.

Notification of Acceptance: November 6.

EVALUATION CRITERIA

    • Theory and application-oriented contributions are equally welcome.

    • All submissions should emphasize relevance to the workshop subject.

    • Submission of previously published work or work under review is allowed, in particular NIPS-2013 submissions. However, for oral presentations preference will be given to novel work or work that was not yet presented elsewhere (for example, recent journal publications or NIPS posters). All double submissions must be clearly declared as such!

Invited Speakers

Alexandrs Slivkins, Microsoft Research

Michael W. Mahoney, Stanford

Accepted Contributions

Sameer Singh, Sebastian Riedel, and Andrew McCallum. Anytime belief propagation using sparse domains. (Presentation + Poster)

Falk Lieder, Noah D. Goodman, and Thomas L. Griffiths. Reverse-engineering resource-efficient algorithms. (Presentation + Poster)

Yi Huang, Brian Powers, and Lev Reyzin. Training-time optimization of a budgeted booster. (Presentation + Poster)

Ioannis Mitliagkas, Constantine Caramanis, and Prateek Jain. Memory limited, streaming PCA. (Presentation + Poster)

Jacob R. Gardner, Zhixiang (Eddie) Xu, Kilian Q. Weinberger, and John P. Cunningham. Resource Efficient Gaussian Processes. (Presentation + Poster)

Komail Badami, Wannes Meert, and Marian Verhelst. Context- and cost-aware feature selection in ultra-low-power sensor interfaces. (Poster)

Luca Oneto, Alessandro Ghio, Davide Anguita, Sandro Ridella. Learning with few bits on small–scale devices. (Poster)

Schedule

7:30 - 7:40 Opening Remarks

7:40 - 8:40 Michael Mahoney. Input-sparsity time random projections and tera-scale regression algorithms.

8:40 - 9:00 Brian Powers. Training-time optimization of a budgeted booster.

9:00 - 9:30 Break

9:30 - 9:50 Prateek Jain. Memory limited, streaming PCA.

9:50 - 10:10 Falk Lieder. Reverse-engineering resource-efficient algorithms.

10:10 - 10:30 Kilian Q. Weinberger. Resource Efficient Gaussian Processes.

10:30 - 15:30 Ski break

15:30 - 16:30 Alexandrs Slivkins. Bandits with knapsacks.

16:30 - 17:00 Sameer Singh. Anytime belief propagation using sparse domains.

17:00 - 17:30 Break

17:30 - 18:30 Posters

Talk Abstracts

Michael W. Mahoney. Input-sparsity time random projections and tera-scale regression algorithms.

Regression algorithms are the bread and butter of statistical data analysis, and so developing more resource-efficient regression algorithms is a pressing concern, given the ubiquity and diversity of data in many modern applications. Let us say that an algorithm runs in input-sparsity time if its running time for arbitrary (i.e., worst-case) input is proportional to the number of nonzeros of the input, plus lower order terms. Rather remarkably, input-sparsity time randomized algorithms have been shown to exist for fundamental matrix problems that underlie much of machine learning and modern statistical data analysis. I will describe recent theoretical results on input-sparsity time random projection algorithms for least-squares, least absolute deviations, and quantile regression; and I will also describe recent empirical results on implementing these algorithms on terabyte-sized data.

Alexandrs Slivkins. Bandits with knapsacks.

Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in machine learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising, to dynamic pricing. In many of these application domains the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the time horizon. The literature lacks a general model encompassing these sorts of problems. We introduce such a model, called ``bandits with knapsacks'', that combines aspects of stochastic integer programming with online learning. A distinctive feature of our problem, in comparison to the existing regret-minimization literature, is that the optimal policy for a given latent distribution may significantly outperform the policy that plays the optimal fixed arm. Consequently, achieving sublinear regret in the bandits-with-knapsacks problem is significantly more challenging than in conventional bandit problems.

We present two algorithms whose reward is close to the information-theoretic optimum: one is based on a novel ``balanced exploration'' paradigm, while the other is a primal-dual algorithm that uses multiplicative updates. Further, we prove that the regret achieved by both algorithms is optimal up to polylogarithmic factors.

Joint work with Robert Kleinberg and Ashwinkumar Badanidiyuru. Appeared at FOCS 2013.

Organizers

Yevgeny Seldin, Queensland University of Technology and UC Berkeley

Yasin Abbasi-Yadkori, Queensland University of Technology and UC Berkeley

Koby Crammer, The Technion

Ralf Herbrich, Amazon

Peter Bartlett, UC Berkeley and Queensland University of Technology