Séance du 27 mars 2017

Séance organisée par Erwann Le Pennec et Joseph Salmon

Lieu : IHP, Amphi Hermite

14:00 : Ilaria GIULINI (Paris-Diderot, LPMA)

Titre : Kernel spectral clustering

Résumé : We consider the setting of performing spectral clustering in a Hilbert space. We show how spectral clustering, coupled with some preliminary change of representation in a reproducing kernel Hilbert space, can bring down the representation of classes to a low-dimensional space and we propose a new algorithm for spectral clustering that automatically estimates the number of classes.

15:00 : Francois PORTIER (Telecom Paristech):

Titre : Ordinary-least-squares Monte Carlo

Résumé : The use of control variates is a well-known method to reduce the variance of the naive Monte-Carlo estimator of an integral. A formulation of the method is presented based on a regression model where the dependent variable is the function to integrate and the predictors are elements of a linear space composed by test functions with known integral, assumed to be zero without loss of generality. It is shown that the ordinary least-squares estimator for the intercept is equal to a certain control-variate enhanced Monte-Carlo estimator. The asymptotic variance of the estimator is equal to the variance of the residual variable in the regression model. More importantly, it is demonstrated that if the number of predictors is allowed to grow to infinity with the number of Monte Carlo replicates, convergence takes place at a faster rate than for ordinary Monte Carlo integration, the limit distribution of the standardized errors still being Gaussian. In addition, the estimator of the residual variance in the regression model is a consistent estimator for the asymptotic variance. The performance of the method in finite samples is investigated through a simulation study for various choices of the test functions and in various dimensions.

16:00 : Zoltán SZABO (École Polytechnique)

Titre : Distribution Regression: A Simple Technique with Minimax-optimal Guarantee:

Résumé : In my talk, I am going to focus on the distribution regression problem (DRP): we regress from probability measures to Hilbert-space valued outputs, where the input distributions are only available through samples (this is the 'two-stage sampled' setting). Several important statistical and machine learning tasks can be phrased within this framework including point estimation tasks without analytical solution (such as hyperparameter or entropy estimation) and multi-instance learning. However, due to the two-stage sampled nature of the problem, the theoretical analysis becomes quite challenging: to the best of our knowledge the only existing method with performance guarantees to solve the DRP task requires density estimation (which often performs poorly in practise) and the distributions to be defined on a compact Euclidean domain. We present a simple, analytically tractable alternative to solve the DRP task: we embed the distributions to a reproducing kernel Hilbert space and perform ridge regression from the embedded distributions to the outputs. Our main contribution is to prove that this scheme is consistent in the two-stage sampled setup under mild conditions: we present an exact computational-statistical efficiency tradeoff analysis showing that the studied estimator is able to match the one-stage sampled minimax-optimal rate. This result answers a 17-year-old open question, by establishing the consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002] in regression. We also cover consistency for more recent kernels on distributions, including those due to [Christmann and Steinwart, 2010]. The practical efficiency of the studied technique is demonstrated in supervised entropy learning and aerosol prediction using multispectral

satellite images.

Paper: Zoltán Szabó, Bharath K. Sriperumbudur, Barnabás Póczos, Arthur Gretton. Learning Theory for Distribution Regression. Journal of Machine Learning Research, 17(152):1-40, 2016. (http://jmlr.org/papers/v17/14-510.html)

Code: in ITE toolbox (https://bitbucket.org/szzoli/ite/)

Speaker: http://www.cmap.polytechnique.fr/~zoltan.szabo/