Invited Speakers Richard Samworth (Cambridge)
 Highdimensional Changepoint Estimation via Sparse Projection
 Abstract: Changepoints are a very common feature of Big Data that arrive in the form of a data stream. In this paper, we study highdimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates. The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series. We propose a twostage procedure called 'inspect' for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series. We then apply an existing univariate changepoint detection algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.
 PoLing Loh (U. WisconsinMadison)
 Source Estimation and Centrality in Random Growing Trees
 Abstract: We discuss recent work regarding random growing trees. A common theme is the evolution of the most central node(s) in such trees and their behavior in the limit as the tree size grows. We first discuss source estimation in a diffusion spreading over a regular tree, and show that it is possible to construct confidence sets for the diffusion source with size independent of the number of infected nodes. Our estimators are motivated by analogous results in the literature concerning identification of the root node in preferential attachment and uniform attachment trees; at the core of our proofs is a probabilistic analysis of Polya urns corresponding to the number of uninfected neighbors in specific subtrees of the infection tree. We then turn to the problem of persistence, and demonstrate that the aforementioned confidence sets have the property of persistence (i.e., settling down after finitely many steps), with probability 1. Our theory holds for trees possessing a certain type of regular structure, including diffusion subtrees, uniform attachment trees, and linear and sublinear preferential attachment trees. This is joint work with Justin Khim and Varun Jog.
 Mark Schmidt (University of British Columbia)
 NonUniform Stochastic Average Gradient Method for Training Conditional Random Fields
 Abstract: We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearlyconvergent stochastic gradient method, propose a nonuniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under nonuniform sampling. Our experimental results reveal that our method often significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimallytuned stochastic gradient methods in terms of test error.
 KaiWei Chang (University of Virginia)
 Learning to Search for Structured Output Prediction
 Abstract: Many prediction problems required structured decisions. That is, the relationships between the output variables could represent a sequence, a set of clusters, or in the general case, a graph. When solving these problems, it is important to make consistent decisions that take the interdependencies among output variables into account. Such problems are often referred to as structured prediction problems. In this talk, I will describe learning to search approaches for such problems. This family of approaches explicitly predict the joint set of decisions incrementally, conditioning on past and future decisions. Such models may be particularly useful when the dependencies between the predictions are complex, the loss is complex, or the construction of an explicit graphical model is impossible. We have used our algorithms to learn expressive models from large amounts of annotated data and achieved stateofthe art performance on several natural language processing tasks.
 Allen Yang (UC Berkeley)
 Sparse Representation and LowRank Representation in Computer Vision
 Abstract: The recent vibrant study of sparse representation and compressive sensing has led to numerous groundbreaking results in signal processing and machine learning. In this talk, we will discuss how to model and recover lowdimensional structures in highdimensional signals, and how to verify that the models are appropriate. We will gently introduce the foundational theoretical results in this area, and show how theory informs the modeling process. We will illustrate this process through examples drawn from a number of vision applications, including Robust Face Recognition, Invariant Texture Feature Detection, Informative Feature Selection, and Robust Object Tracking.
 Rene Vidal (Johns Hopkins)
 Globally Optimal Structured LowRank Matrix Factorizations
 Abstract: Recently, convex solutions to lowrank matrix factorization problems have received increasing attention in machine learning. However, in many applications the data can display other structures beyond simply being lowrank. For example, images and videos present complex spatiotemporal structures, which are largely ignored by current lowrank methods. This talk will explore a matrix factorization technique suitable for large datasets that captures additional structure in the factors by using a projective tensor norm, which includes classical image regularizers such as total variation and the nuclear norm as particular cases. Although the resulting optimization problem is not convex, we show that under certain conditions on the factors, any local minimizer for the factors yields a global minimizer for their product. Examples in biomedical video segmentation and hyperspectral compressed recovery show the advantages of our approach on highdimensional datasets.
 Guillaume Obozinski (Ecole des Ponts)
 A Unified Perspective on Convex Structured Sparsity: Hierarchical, Symmetric, Submodular Norms and Beyond
 Abstract: In this talk, I will present an attempt at a unified theory for convex structured sparsityinducing norms on vectors associated with combinatorial penalty functions. Specifically, for models simultaneously (a) penalized by a setfunction defined on the support of the unknown parameter vector which represents prior knowledge on supports, and (b) regularized in Lpnorm, it is possible to show that each of the obtained combinatorial optimization problems admits a natural relaxation as an optimization problem regularized by a matching sparsityinducing norm. This framework obtains generic constructive derivations for the Lasso, group Lasso, exclusive Lasso, the OWL, OSCAR and SLOPE penalties, the ksupport norm, several hierarchical penalties considered in the literature for chains, tree, and DAG structures, and produces also new norms. More generally the tight relaxations obtained take the form of combinatorial latent group Lassos associated with mincover penalties also known as blockcoding schemes. I will show in particular how the theory developed produces general efficient algorithms for many of these norms, recovers as special cases several algorithms proposed in the literature and yields improved procedures for some cases. For norms associated with submodular penalties, including a large number of nondecomposable norms, general support recovery and fast rates convergence can be obtained based on generalization of the irrepresentability condition and the restricted eigenvalue condition. (Joint work with Francis Bach)
 Lorenzo Rosasco (MIT)
 A Consistent Regularization Approach for Structured Prediction
 Abstract: We propose and analyze a regularization approach for structured prediction problems. We characterize a large class of loss functions that allows to naturally embed structured outputs in a linear space. We exploit this fact to design learning algorithms using a surrogate loss approach and regularization techniques. We prove universal consistency and finite sample bounds characterizing the generalization properties of the proposed methods. Experimental results are provided to demonstrate the practical usefulness of the proposed approach.
Contributed Speakers 
