Monday June 313:30 - 14:30 Ecole normale supérieure / room W Vianney Perchet (Université Paris Diderot) Minimisation du regret (et calibration) à travers l'approchabilité de Blackwell Je vais dans un premier temps présenter et développer la théorie d'approchabilité introduite par Blackwell (théorie commune en théorie des jeux qui remonte en fait aux années cinquantes). Dans un second temps, je vais montrer comment différents problèmes classiques d' "online learning" et d' "online convex optimization" (la minimisation du regret et la calibration, pour ne pas les citer) se représentent naturellement et tout simplement dans le cadre purement géométrique de Blackwell. Ces techniques permettent de répondre, et de d'ouvrir de nouvelles possibilités, à certains problèmes ouverts. Lien vers l'article correspondant Thursday April 2515:00 - 17:00 Ecole des Mines, L213 Zaid Harchaoui (INRIA) Presentation of the paper: Daniel Hsu, Sham M. Kakade and Tong Zhang, A spectral algorithm for learning hidden Markov models, Journal of Computer and System Sciences, 78(5):1460-1480, 2012. April 8Monday 13:30 - 14:30 Ecole normale supérieure / room W Richard Samworth [University of Cambridge] Variable selection with error control: Another look at stability selection Stability selection was recently introduced by Meinshausen and Buhlmann as a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. We introduce a variant, called complementary pairs stability selection, and derive bounds both on the expected number of variables included by complementary pairs stability selection that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded. These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure. Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology. Link to the paper on which the talk will be based on Thursday April 415:00 - 17:00 Ecole des Mines / room V 106 B Emile Richard (Mines ParisTech) Gaussian complexity measures and implications in compressed sensing The Gaussian width of the tangent cone and some related objects naturally arise as golden measures of complexity in compressed sensing. We will discuss recent works that have introduced interesting properties of this object, namely the lower bound it provides on the number of iid measurements required for achieving perfect recovery (Chandrasekaran et al. 2012), and also connections with phase transition (Amelunxen et al 2013). References: The convex geometry of linear inverse problems, by Chandrasekaran et al. (2012) Living on the edge: a geometric theory of phase transition in convex optimization, by Amelunxen et al. (2013) Computational and statistical tradoffs via convex relaxation, by Chandrasekaran and Jordan (2012)Thursday March 2815:00 - 16:00 ENSAE, amphi 1 Stability Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this talk, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability of interpretable encoding models for movie reconstruction from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g. double-exponential) in high dimensional regression models with p predictors and n independent samples. In particular, when p/n→(0.3,1) and error is double-exponential, OLS is a better estimator than LAD. Monday March 1813:30 - 14:30 Ecole normale supérieure / room W Peter Grünwald [CWI Amsterdam and Leiden University] Safe Learning: learning the learning rate in (PAC-)Bayesian inference Abstract: Bayesian inference can behave badly if the model under consideration is wrong: in some simple settings, the posterior fails to concentrate even in the limit of infinite sample size. We introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting "safe" estimator continues to achieve good rates with wrong models. It can also be used to automatically determine the learning rate in PAC-Bayesian settings. For example, when applied to classification models of varying complexity, the safe estimator adapts to the optimal rate for the Tsybakov exponent of the underlying distribution. The safe estimator is based on empirical mixability, which generalizes a central concept in worst-case online prediction due to Vovk. Thus, safe estimation connects three paradigms in learning: Bayesian inference, (frequentist) statistical learning theory and (worst-case) on-line prediction.Monday February 1813:30 - 14:30 Ecole normale supérieure / room W Daniil Ryabko [INRIA Lille Nord-Europe, SequeL] Time-series information and unsupervised representation learning Abstract: Given a series of observations X_1...X_n coming from a large (high-dimensional) space X, we would like to find a representation function f mapping X to a small (for example, finite) space Y such that the series f(X_1)...f(X_n) preserve as much information as possible about the original time-series dependence in X_1...X_n. This problem, and, specifically, the focus on time-series dependence, is motivated by various applications in semi- and unsupervised learning, in which a learner has to deal with high-dimensional and highly-dependent time series with little to no feedback. It can be shown that, for stationary time series, the function f can be selected as the one maximizing the time-series information h_0(f(X))- h_\infty (f(X)) where h_0(f(X)) is the Shannon entropy of f(X_1) and h_\infty (f(X)) is the entropy rate of the time series f(X_1)...f(X_n)... This quantity can be estimated empirically without estimating the distribution of the original time series X_1...X_n... Some implications for the problem of optimal control will also be considered. Thursday January 2415:00 - 17:00 Ecole des Mines / room L106 Martin Jaggi (Ecole Polytechnique) Abstract: We
provide stronger and more general primal-dual convergence results for
Frank- Wolfe-type algorithms (a.k.a. conditional gradient) for
constrained convex optimization, enabled by a simple framework of
duality gap certificates. Our analysis also holds if the linear
subproblems are only solved approximately (as well as if the gradients
are inexact), and is proven to be worst-case optimal in the sparsity of
the obtained solutions.On the application side, this allows us to
unify a large variety of existing sparse greedy methods, in particular
for optimization over convex hulls of an atomic set, even if those sets
can only be approximated, including sparse (or structured sparse)
vectors or matrices, low-rank matrices, permutation matrices, or
max-norm bounded matrices. We present a new general framework
for convex optimization over matrix factorizations, where every
Frank-Wolfe iteration will consist of a low-rank update, and discuss the
broad application areas of this approach. Monday January 21
13:30 - 14:30 Ecole normale supérieure / room W Mesrob I. Ohannessian [Université Paris-Sud] Estimation des probabilités rares et queues lourdes discrètes Langue : exposé en français, transparents en anglaisLien : vers l'articleRésumé :
Despite the increasing volume of data in modern statistical
applications, critical patterns and events have often very little, if
any, representation. This isn't unreasonable, given that such variables
are critical precisely because they are rare. We then have to raise the
natural question: when can we infer something meaningful in such
contexts? We consider the archetypal problem of estimating the
probability of symbols that have occurred very rarely, in samples drawn
independently from an unknown discrete distribution. In particular, we
study the multiplicative consistency of estimators, defined as the ratio
of the estimate to the true quantity converging to one. We first show
that the classical Good-Turing estimator is not universally consistent
in this sense, despite enjoying favorable additive properties. We then
show that a notion of heavy tail is sufficient for consistency. At the
core of this result is a new multiplicative concentration inequality.
Additionally, within the heavy-tailed regime, we propose new consistent
estimators that address some of the shortcomings of Good-Turing
estimators, and have the "absolute discounting" structure of popular
smoothing techniques used in natural language modeling.Thursday December 1315:00 - 17:00 Ecole des Mines, room V106A NIPS debriefing (see the list of all NIPS 2012 papers)- A Defazio and T. Caetano, A convex formulation for learning scale-free networks via submodular relaxation (presented by Emile Richard)
- J Giesen, J Mueller, S Laue and S Swiercy. Approximating concavely parametrized optimization problems (presented by Elsa Bernard)
- P Loh and M Wainwright, Structure estimation for discrete graphical models: generalized covariance matrices and their inverse (presented by Remi Lajugie)
- R Kondor and W Dempsey, Multiresolution analysis on the symmetric group (presented by Eric Sibony)
- M Pilanci, L El Ghaoui and V Chandrasekaran, Recovery of sparse probability measures via convex programming (presented by Edouard Grave)
- S Kpotufe and A Boularias, Gradient weights help nonparametric regressors (presented by Nino Shervashidze)
Monday November 1913:30 - 14:30 Ecole normale supérieure / room W Emilie Kaufmann [Télécom ParisTech] Un point de vue bayésien pour des algorithmes de bandits plus performants Articles utilisés pour la présentation : Bayes-UCB et Thompson Sampling Thursday October 25 15:00 - 17:00 Ecole des Mines, room L106
Link Prediction in Graphs with Autoregressive Features We consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward algorithm. [Joint work with Stéphane Gaïffas and Nicolas Vayatis]
Sparse Prediction and First-Order Optimization I will derive a novel regularization method that corresponds to the tightest convex relaxation of sparsity combined with an L2 penalty. I will show that this new method provides a tighter such relaxation than the elastic net and will propose using it as a replacement for the Lasso or the elastic net in sparse prediction problems. I will show that this norm can be computed efficiently and that regularizing with the norm can be solved using an accelerated first-order method. I will also show preliminary experiments that show competitive performance vs. Lasso and elastic net. [Joint work with R. Foygel (University of Chicago) and N. Srebro (TTI Chicago).] Monday October 1513:30 - 14:30 Ecole normale supérieure / room W Yohann de Castro [Université Paris-Sud] LASSO and Dantzig Selector using Expander Designs Résumé Dans cette présentation, on montrera le lien entre la taille de la section de la boule L1 par le noyau de la matrice de design et l'erreur de reconstruction et de prédiction du LASSO et du Dantzig selector. En particulier, on donnera des inégalités oracles pour des designs à partir de graphes expanseurs (Expander Designs). Liens : http://arxiv.org/abs/1108.5533 et http://arxiv.org/abs/1010.2457 Monday September 2413:30 - 14:30 Ecole normale supérieure / room W Tim van Erven [Université Paris-Sud] Mixability in Statistical Learning Thursday September 20 15:00 - 17:00 Ecole des Mines, room L106 Marco Cuturi (Kyoto University) Transportation and machine learning |