Lieu : IHP, amphi Yvonne Choquet-Bruhat (second étage du bâtiment Perrin)
14.00 : Adeline Fermanian (Califrais)
Titre : Scaling ResNets in the Large-depth Regime
Résumé : Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor $\alpha_L$. We show in a probabilistic setting that with standard i.i.d. initializations, the only non-trivial dynamics is for $\alpha_L = \frac{1}{\sqrt{L}}$ - other choices lead either to explosion or to identity mapping. This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and $\alpha_L = \frac{1}{L}$. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.
15.00 : Kolyan Ray (Imperial College London)
Titre : A variational Bayes approach to debiased inference in high-dimensional linear regression
Résumé : We consider statistical inference for a single coordinate of a high-dimensional parameter in sparse linear regression. It is well-known that high-dimensional procedures such as the LASSO can provide biased estimators for this problem and thus require debiasing. We propose a scalable variational Bayes method for this problem based on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. We investigate the numerical performance of our algorithm and establish accompanying theoretical guarantees for estimation and uncertainty quantification.
Joint work with I. Castillo, A. L'Huillier, L. Travis
16.00 : Judith Rousseau (Université Paris Dauphine - PSL)
Titre : Bayesian estimation in high-dimensional Hawkes processes
Résumé : Multivariate Hawkes processes form a class of point processes describing self and inter exciting/inhibiting processes. There is now a renewed interest of such processes in applied domains and in machine learning, but there exists only limited theory about inference in such models, in particular in high dimensions. To be more precise, the intensity function of a linear Hawkes process has the following form: for each dimension $k\leq K$ $$\lambda^k(t) =\sum_{\ell \leq K} \int_0^{t^-} h_{\ell k}(t-s)dN_s^\ell + \nu_k$$ where $(N^\ell, \ell \leq K)$ is the Hawkes process and $\nu_k>0$. There have been some recent theoretical results on Bayesian estimation in the context of linear and nonlinear multivariate Hawkes processes, but these results assumed that the dimension $K$ was fixed. Convergence rates were studied assuming that the observation window $T$ goes to infinity. In this work we consider the case where $K$ is allowed to go to infinity with $T$. We consider generic conditions to obtain posterior convergence rates and we derive, under sparsity assumptions, convergence rates in $L_1$ norm and consistent estimation of the graph of interactions.
Travail joint avec Vincent Rivoirard et Déborah Sulem