Séance du 7 mars 2022

Séance organisée par Anna Korba et Erwan Le Pennec

Lieu : IHP, amphi Darboux


14.00 : Lionel Riou-Durand (University of Warwick)

Titre : Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte-Carlo

Résumé : Sampling approximations for high dimensional statistical models often rely on so-called gradient-based MCMC algorithms. It is now well established that these samplers scale better with the dimension than other state of the art MCMC samplers, but are also more sensitive to tuning [5]. Among these, Hamiltonian Monte Carlo is a widely used sampling method shown to achieve gold standard d^{​​​​​1/4}​​​​​ scaling with respect to the dimension [1]. However it is also known that its efficiency is quite sensible to the choice of integration time, see e.g. [4], [2]. This problem is related to periodicity in the autocorrelations induced by the deterministic trajectories of Hamiltonian dynamics. To tackle this issue, we develop a robust alternative to HMC built upon Langevin diffusions (namely Metropolis Adjusted Langevin Trajectories, or MALT), inducing randomness in the trajectories through a continuous refreshment of the velocities. We study the optimal scaling problem for MALT and recover the d^{​​​​​1/4}​​​​​ scaling of HMC proven in [1] without additional assumptions. Furthermore we highlight the fact that autocorrelations for MALT can be controlled by a uniform and monotonous bound thanks to the randomness induced in the trajectories, and therefore achieves robustness to tuning. Finally, we compare our approach to Randomized HMC ([2], [3]) and establish quantitative contraction rates for the 2-Wasserstein distance that support the choice of Langevin dynamics.


15.00 : Pierre Ablin (CNRS - Université Paris-Dauphine - PSL University)

Titre : Training neural networks with orthogonal weights

Résumé : Imposing an orthogonality constraint on the weights of a neural network is an appealing thing to do: it helps mitigating gradient vanishing / explosion issues, it allows to invert the layer without pain, and it defines norm-preserving transforms. However, training neural networks with orthogonal weights is not straightforward, since it is an optimization problem on a manifold. Standard optimization algorithms on manifolds iterate two steps: first, a descent direction is computed, and then a step is taken in that direction while staying on the manifold. Unfortunately, for orthogonal matrices, it is computationally demanding to move while staying on the manifold, making such methods expensive. We introduce a cheaper iterative strategy that allows iterates to move away from the manifold, but that "lands" on the manifold: it provably converges towards stationary points of the problem. This method has the potential to accelerate greatly the training of neural networks with orthogonal weights.


The first half of the talk will be a comprehensive tutorial on neural networks with orthogonal weights, and on optimization on the orthogonal manifold.


Ref: Ablin, P. and Peyré, G. "https://arxiv.org/abs/2102.07432


16.00 : Jaouad Mourtada (ENSAE/CREST, Institut Polytechnique de Paris)

Titre : Distribution-free robust linear regression

Résumé : We consider random-design linear regression, in a distribution-free setting where no assumption is made on the distribution of the covariates. We start by surveying relevant known results, and indicate an improvement on a classical bound by Györfi, Kohler, Krzyżak and Walk for the truncated least squares estimator. However, we show that the previous procedures turn out to fail in a subtle but severe way. Finally, we determine the minimal assumption on the target variable under which guarantees are possible, and describe a nonlinear prediction procedure achieving a near-optimal high-probability bound.