Workshop Description


Greedy algorithms and related first-order optimization algorithms are at the core of many of the state of the art sparse methods in machine learning, signal processing, harmonic analysis, statistics and other seemingly unrelated areas, with very different goals at first sight. Examples include matching pursuit, boosting, greedy methods for sub-modular optimization, structured prediction, and many more. In the field of optimization, the recent renewed interest in Frank-Wolfe/conditional gradient algorithms opens up an interesting perspective towards a unified understanding of these methods, with a big potential to translate the rich existing knowledge about the respective greedy methods between the different fields.

The scope of this workshop is to gather renowned experts working on those algorithms in machine learning, optimization, signal processing, statistics and harmonic analysis, in order to engender a fruitful exchange of ideas and discussions and to push further the boundaries of scalable and efficient optimization for learning problems.


The goals of the workshop are threefold. First, to provide an accessible review and synthesis of not only recent results but also old-but-forgotten results describing the behavior and performance of the related greedy algorithms. Second, to cover recent advances on extensions of such algorithms relevant to machine learners, such as learning with atomic-norm regularization (Rao et al., 2012; Tewari et al., 2011, Harchaoui et al., 2012), submodular optimization (Bach, 2011), herding algorithms (Chen et al., 2010; Bach et al., 2012), or structured prediction (Lacoste-Julien et al., 2013). One important example of interest here is the study lower bounds as in (Lan, 2013, Jaggi 2013) to better understand the limitations of such greedy algorithms and of sparse methods. Third, the goal is to provide a forum for open problems, and to serve as stepping-stone for cutting-edge research on this family of scalable algorithms for big data problems.


Bach, F. (2011). Learning with Submodular Functions: A Convex Optimization Perspective. arXiv. Bach, F., S. Lacoste-Julien, and G. Obozinski (2012). On the Equivalence between Herding and Conditional Gradient Algorithms. In ICML. Bertsekas, D. (2004). Nonlinear Programming (2nd ed.). Athena Scientific. Chen, Y., M. Welling, and A. J. Smola (2010). Super-Samples from Kernel Herding. In UAI. Clarkson, K. L. (2010). Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm. ACM Transactions on Algorithms 6(4). Demyanov, V. F. and A. M. Rubinov (1970). Approximate methods in optimization problems. Elsevier. Frank, M. and P. Wolfe (1956). An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 95–110. Harchaoui, Z., M. Douze, M. Paulin, and M. Dudik (2012). Large-scale image classification with trace-norm regularization. CVPR. Jaggi, M. (2013). Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. ICML. Lacoste-Julien, S., M. Jaggi, M. Schmidt, and P. Pletscher (2013). Block-Coordinate Frank-Wolfe Optimization for Structural SVMs. ICML. Lan, G. (2013). The complexity of large-scale convex programming under a linear optimization oracle. Technical report. Rao, N., P. Shah, S. J. Wright, and R. Nowak (2012). A Greedy Forward-Backward Algorithm For Atomic Norm Constrained Minimization. ICASSP Schapire, R. E. and Y. Freund (2012). Boosting: Foundations and Algorithms. MIT Press. Shalev-Shwartz, S., N. Srebro, and T. Zhang (2010). Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints. SIAM Journal on Optimization 20, 2807–2832. Temlyakov, V. (2011). Greedy Approximation (1st ed.). New York, NY, USA: Cambridge University Press. Temlyakov, V. N. (2012). Greedy approximation in convex optimization. stat.ML. Tewari, A., P. Ravikumar, and I. S. Dhillon (2011). Greedy Algorithms for Structurally Constrained High Dimensional Problems. NIPS. Tropp, J. A. and A. Gilbert (2007, December). Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit. IEEE Transactions on Information Theory 53(12), 4655–4666.