Past sessions 2012 - 13

Monday June 3
13:30 - 14:30
Ecole normale supérieure / room W 

Vianney Perchet (Université Paris Diderot)
Minimisation du regret (et calibration) à travers l'approchabilité de Blackwell

Je vais dans un premier temps présenter et développer la théorie d'approchabilité introduite par Blackwell (théorie commune en théorie des jeux qui remonte en fait aux années cinquantes). Dans un second temps, je vais montrer comment différents problèmes classiques d' "online learning" et d' "online convex optimization" (la minimisation du regret et la calibration, pour ne pas les citer) se représentent naturellement et tout simplement dans le cadre purement géométrique de Blackwell. Ces techniques permettent de répondre, et de d'ouvrir de nouvelles possibilités, à certains problèmes ouverts.

Thursday April 25
15:00 - 17:00
Ecole des Mines, L213

Zaid Harchaoui (INRIA)

Presentation of the paper:
Daniel Hsu, Sham M. Kakade and Tong Zhang, A spectral algorithm for learning hidden Markov models, Journal of Computer and System Sciences, 78(5):1460-1480, 2012.

 April 8

13:30 - 14:30
Ecole normale supérieure / room W 

Richard Samworth [University of Cambridge]
Variable selection with error control: Another look at stability selection 

Stability selection was recently introduced by Meinshausen and Buhlmann as a very general technique designed to improve the performance of a variable selection algorithm.  It is based on aggregating the results of applying a selection procedure to subsamples of the data.  We introduce a variant, called complementary pairs stability selection, and derive bounds both on the expected number of variables included by complementary pairs stability selection that have low selection probability under the original procedure, and on the expected number of high selection probability variables that are excluded.  These results require no (e.g. exchangeability) assumptions on the underlying model or on the quality of the original selection procedure.  Under reasonable shape restrictions, the bounds can be further tightened, yielding improved error control, and therefore increasing the applicability of the methodology. 

Link to the paper on which the talk will be based on

Thursday April 4
15:00 - 17:00
Ecole des Mines / room V 106 B 

Emile Richard (Mines ParisTech)
Gaussian complexity measures and implications in compressed sensing

The Gaussian width of the tangent cone and some related objects naturally arise as golden measures of complexity in compressed sensing. We will discuss recent works that have introduced interesting properties of this object, namely the lower bound it provides on the number of iid measurements required for achieving perfect recovery (Chandrasekaran et al. 2012), and also connections with phase transition (Amelunxen et al 2013). 

The convex geometry of linear inverse problems, by Chandrasekaran et al. (2012) 
Computational and statistical tradoffs via convex relaxation, by Chandrasekaran and Jordan (2012)

Thursday March 28
15:00 - 16:00
ENSAE, amphi 1

Bin Yu (UC Berkeley)

Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this talk, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability of interpretable encoding models for movie reconstruction from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to
bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g. double-exponential) in high dimensional regression models with p predictors and n independent samples. In particular, when p/n→(0.3,1) and error is double-exponential, OLS is a better estimator than LAD.

Monday March 18
13:30 - 14:30
Ecole normale supérieure / room W 

Peter Grünwald [CWI Amsterdam and Leiden University]
Safe Learning: learning the learning rate in (PAC-)Bayesian inference 

Abstract: Bayesian inference can behave badly if the model under consideration is wrong: in some simple settings, the posterior fails to concentrate even in the limit of infinite sample size. We introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting "safe" estimator continues to achieve good rates with wrong models. It can also be used to automatically determine the learning rate in PAC-Bayesian settings. For example, when applied to classification models of varying complexity, the safe estimator adapts to the optimal rate for the Tsybakov exponent of the underlying distribution. The safe estimator is based on empirical mixability, which generalizes a central concept in worst-case online prediction due to Vovk. Thus, safe estimation connects three paradigms in learning: Bayesian inference, (frequentist) statistical learning theory and (worst-case) on-line prediction.

Monday February 18
13:30 - 14:30
Ecole normale supérieure / room W 

Daniil Ryabko [INRIA Lille Nord-Europe, SequeL]
Time-series information and unsupervised representation learning 

Abstract: Given a series of observations  X_1...X_n coming from a large (high-dimensional) space X, we would like to find a representation function f mapping X to a small (for example, finite) space Y such that  the series  f(X_1)...f(X_n) preserve as much information as possible  about the original time-series dependence in X_1...X_n.  This problem, and, specifically, the focus on time-series dependence, is motivated by various  applications in semi- and unsupervised learning,  in which a learner has to deal with  high-dimensional and highly-dependent time series with little to no feedback.  It can be shown that, for stationary time series, the function f can be selected as the one maximizing the time-series information h_0(f(X))- h_\infty (f(X)) where h_0(f(X)) is the Shannon entropy of f(X_1) and h_\infty (f(X)) is the entropy rate of the time series f(X_1)...f(X_n)...   This quantity can be estimated empirically without estimating the distribution of the original time series X_1...X_n... Some implications for the problem of optimal control will also be considered. 

Thursday January 24
15:00 - 17:00
Ecole des Mines / room L106

Martin Jaggi (Ecole Polytechnique)

Abstract: We provide stronger and more general primal-dual convergence results for Frank- Wolfe-type algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worst-case optimal in the sparsity of the obtained solutions.
On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, low-rank matrices, permutation matrices, or max-norm bounded matrices.
We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach.

Monday January 21
13:30 - 14:30
Ecole normale supérieure / room W

Mesrob I. Ohannessian [Université Paris-Sud]
Estimation des probabilités rares et queues lourdes discrètes

Langue : exposé en français, transparents en anglais
Lien : vers l'article

Résumé : Despite the increasing volume of data in modern statistical applications, critical patterns and events have often very little, if any, representation. This isn't unreasonable, given that such variables are critical precisely because they are rare. We then have to raise the natural question: when can we infer something meaningful in such contexts? We consider the archetypal problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate to the true quantity converging to one. We first show that the classical Good-Turing estimator is not universally consistent in this sense, despite enjoying favorable additive properties. We then show that a notion of heavy tail is sufficient for consistency. At the core of this result is a new multiplicative concentration inequality. Additionally, within the heavy-tailed regime, we propose new consistent estimators that address some of the shortcomings of Good-Turing estimators, and have the "absolute discounting" structure of popular smoothing techniques used in natural language modeling.

Thursday December 13
15:00 - 17:00
Ecole des Mines, room V106A

NIPS debriefing (see the list of all NIPS 2012 papers)

Monday November 19
13:30 - 14:30
Ecole normale supérieure / room W 

Emilie Kaufmann [Télécom ParisTech]
Un point de vue bayésien pour des algorithmes de bandits plus performants

Articles utilisés pour la présentation : Bayes-UCB et Thompson Sampling 

Thursday October 25 
15:00 - 17:00
Ecole des Mines, room L106

Emile Richard [Mines ParisTech]

Link Prediction in Graphs with Autoregressive Features

We consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward algorithm. [Joint work with Stéphane Gaïffas and Nicolas Vayatis]

Andreas Argyriou [Ecole Centrale de Paris]

Sparse Prediction and First-Order Optimization

I will derive a novel regularization method that corresponds to the tightest convex relaxation of sparsity combined with an L2 penalty. I will show that this new method provides a tighter such relaxation than the elastic net and will propose using it as a replacement for the Lasso or the elastic net in sparse prediction problems. I will show that this norm can be computed efficiently and that regularizing with the norm can be solved using an accelerated first-order method. I will also show preliminary experiments that show competitive performance vs. Lasso and elastic net. [Joint work with R. Foygel (University of Chicago) and N. Srebro (TTI Chicago).]

Monday October 15
13:30 - 14:30
Ecole normale supérieure / room W 

Yohann de Castro [Université Paris-Sud]
LASSO and Dantzig Selector using Expander Designs

Dans cette présentation, on montrera le lien entre la taille de la section de la boule L1 par le noyau de la matrice de design et l'erreur de reconstruction et de prédiction du LASSO et du Dantzig selector. En particulier, on donnera des inégalités oracles pour des designs à partir de graphes expanseurs (Expander Designs). 

Liens : et

Monday September 24
13:30 - 14:30
Ecole normale supérieure / room W 

Tim van Erven [Université Paris-Sud]
Mixability in Statistical Learning 

Thursday September 20 
15:00 - 17:00
Ecole des Mines, room L106
Marco Cuturi (Kyoto University)
Transportation and machine learning