New Trends in Statistical Learning II

(11-18 June 2022, Porquerolles)

This conference aims to bring together researchers in order to present/discuss current and upcoming trends in Statistical Learning. The focus will be on the recent theoretical advances on deep learning generalization; the emerging field of quantitative ethics and the development of safe and robust decision-making systems.

Organizing commitee : 

Karim Lounici  (CMAP, Ecole Polytechnique) and Katia Meziani (Ceremade,University Dauphine-PSL)


Organizing commitee:


Lounici , Karim (CMAP, Ecole Polytechnique)

Meziani, Katia (Ceremade,Dauphine-PSL University- CREST)



Participants :


Alquier , Pierre (Riken AIP, Japan)

Dalalyan, Arnak (CREST-ENSAE)

Brunel, Victor-Emmanuel (CREST-ENSAE)

Belucci Teixeira Bruno (Ceremade,Dauphine-PSL , BNP)

Chzhen, Evgenii  (Paris-Saclay University)

Hebiri, Mohamed (Gustave Eiffel University)

Houdré Christian (Georgia Institute of Technology)

Koltchinskii ,Vladimir (Georgia Institute of Technology)

Mourtada, Jaouad (CREST-ENSAE)

Pacreau, Gregoire(CMAP, Ecole Polytechnique,)

Pontil, Massimiliano (Istituto Italiano di Tecnologia and University College London)

Riu, Benjamin (CMAP, Ecole Polytechnique, University Dauphine-PSL,Uptilab)

Salmon Joseph (Montpellier University)

Tsybakov, Alexander B. (CREST-ENSAE)

, Sisi (Allianz Partner)






Program


Main lectures 



Christian HOUDRÉ (Georgia Institute of Technology) 


Title:  On the limiting shape (with rates) of Young Diagrams associated with random words (with a view towards quantum statistics) 


 Starting with longest increasing subsequences in random words, I’ll present various results on the limiting shape of RSK Young diagrams associated with these words.  Connections with Gaussian random matrices will also be presented. 



Vladimir KOLTCHINSKII (Georgia Institute of Technology) 




Title: Bias reduction in estimation of smooth functionals of high dimensional parameters



We will discuss a problem of estimation of smooth functionals of high-dimensional parameters of statistical models with a focus on bias reduction methods. Two methods of bias reduction will be discussed. One is based on finding a functional $g$ such that

$g(\hat \theta)$ is approximately an unbiased estimator of $f(\theta)$ for a given functional $f$ and a given base estimator $\hat \theta$ of unknown parameter $\theta.$ Another one is based on a linear aggregation of plug-in estimators with different sample sizes. Risk bounds in Orlicz norms (nearly minimax optimal) for

Hölder smooth functionals of mean and covariance of high-dimensional and infinite-dimensional Gaussian models will be presented.




Talks



Pierre ALQUIER (Riken AIP, Japan) 


url: http://proceedings.mlr.press/v130/doan21a.html


Title:  A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix


Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method can help reduce CF on classical CL datasets.

 


 Arnak DALALYAN  (CREST-ENSAE)  


Title: Optimal detection of the feature matching map in presence of noise and outliers (joint with Tigran Galstyan and Arshak Minasyan)



 We consider the problem of finding the matching map between two sets of d dimensional vectors from noisy observations, where the second set contains outliers. The matching map is then an injection, which can be consistently estimated only if the vectors of the second set are well separated. The main result shows that, in the high-dimensional setting, a detection region of unknown injection can be characterized by the sets of vectors for which the inlier-inlier distance is of order at least $d^{1/4}$ and the inlier-outlier distance is of order at least $d^{1/2}$. These rates are achieved using the estimated matching minimizing the sum of logarithms of distances between matched pairs of points. We also prove lower bounds establishing optimality of these rates. Finally, we report results of numerical experiments on both synthetic and real world data that illustrate our theoretical results and provide further insight into the properties of the estimators studied in this work.



Evgenii CHZHEN (Paris-Saclay University) 


Title: : A gradient estimator via L1-randomization for online zero-order optimization with two point feedback. 


url: https://arxiv.org/abs/2205.13910


This work studies online zero-order optimization of convex and Lipschitz func- tions. We present a novel gradient estimator based on two function evaluation and randomization on the l1-sphere. Considering different geometries of feasible sets and Lipschitz assumptions we analyse online mirror descent algorithm with our estimator in place of the usual gradient. We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise. We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem. In the case of canceling noise that was previously studied in the literature, our guarantees are either comparable or better than state- of-the-art bounds obtained by Duchi et al. and Shamir  for non-adaptive algorithms. Our analysis is based on deriving a new Poincaré type inequality for the uniform measure on the $l_1$-sphere with explicit constants, which may be of independent interest.



Jaouad MOURTADA  (CREST-ENSAE) 


Title: Coding convex bodies under Gaussian noise, and the Wills functional


In sequential probability assignment, one aims to assign a large probability to a sequence of observations (unknown a priori), close to that of the best a posteriori distribution within a prescribed model. This prediction problem is intimately connected to that of lossless coding in information theory. In this work, we study the case of a sequence of real-valued observations, modeled by a subset of the Gaussian sequence model with mean constrained to a general convex body. This can be thought of as an information-theoretic analogue of fixed-design regression. We show that the minimax-optimal error is exactly given by a certain functional of the constraint set from convex geometry called the Wills functional. As a consequence, we express the optimal error in terms of basic geometric quantities associated to the convex body, namely its intrinsic volumes After comparing the optimal error to the Gaussian width of the constraint set, and to fixed points of local Gaussian widths, we state a fundamental concavity property of the error, and deduce some strong monotonicity properties with respect to noise and sample size.





Massimiliano PONTIL (Istituto Italiano di Tecnologia and University College London)


 Title:  Learning Dynamical Systems via Koopman OperatorRegression in Reproducing Kernel Hilbert Space




We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for  forecasting and mode decomposition.

The talks discusses work from the papers: 


https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2205.14027.pdf&data=05%7C01%7Cmeziani%40ceremade.dauphine.fr%7C15cd4e5f81124bfaaa5708da529fa0ab%7C81e7c4de26c94531b076b70e2d75966e%7C0%7C0%7C637913142925382180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YALkG9YElDsZlAFTt%2BoWH%2FQmHv8xGUxpbuknIVdKMk8%3D&reserved=0



Joseph SALMON (Montpellier University) 


Title: Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification


url: http://josephsalmon.eu/talks/pres_porquerolles.pdf



Modern classification tasks can include several thousand of possibly very similar classes.

One such example is the Pl\@ntNet application, which aims at providing users with the correct plant species given an input image.

In this context, the high ambiguity results in low top-1 accuracy. This motivates top-K classification, in which K possible classes are returned. Yet, proposing top-K losses (to minimize the top-K error) tailored for deep learning remains a challenge, both theoretically and practically. We will present a stochastic top-K hinge loss for deep learning inspired by recent developments on top-K calibrated losses.

The proposal is based on the smoothing of the top-K operator building on the flexible "perturbed optimizer" framework.

We show that our loss function performs well for balanced datasets. In addition, we propose a simple variant of our loss to handle imbalanced cases and that significantly outperforms other baseline loss functions on Pl\@ntNet-300K. The latter is an open dataset of plant images obtained from the Pl\@ntNet application, characterized by high ambiguity and a long-tailed distribution, that we recently released. This is joint work with Camille Garcin, Alexis Joly and Maximilien Servajean.






Benjamin RIU (CMAP)


 Title:  MCD: Marginal Contrastive Discrimination, a novel technique for conditional density estimation.


We consider the problem of conditional density estimation, which is a major topic of interest in the fields  of statistical and machine learning. Our method, called Marginal Contrastive Discrimination, MCD, reformulates the conditional density function into two factors, the marginal density function of the target variable and a ratio of density functions which can be estimated through binary classification. Like noise-contrastive methods, MCD can leverage state-of-the-art supervised learning techniques to perform  conditional density estimation  , including neural networks. Our benchmark reveals that our method significantly outperforms in practice existing methods on most density models and regression datasets.







Practical informations

How to get there :  https://www.bateaux-taxi.com/

The conférence will be held at  the center  https://www.igesa.fr/