(with André Schlichting) Stationary solutions of McKean-Vlasov equation on a high-dimensional sphere and other Riemannian manifolds [arXiv]
We study stationary solutions of McKean-Vlasov equation on a high-dimensional sphere and other compact Riemannian manifolds. We extend the equivalence of the energetic problem formulation to the manifold setting and characterize critical points of the corresponding free energy functional. On a sphere, we employ the properties of spherical convolution to study the bifurcation branches around the uniform state. We also give a sufficient condition for an existence of a discontinuous transition point in terms of the interaction kernel and compare it to the Euclidean setting. We illustrate our results on a range of system, including the particle system arising from the transformer models and the Onsager model of liquid crystals.
(with André Schlichting and Mark Peletier) Singular-limit analysis of gradient descent with noise injection. [arXiv]
We study the limiting dynamics of a large class of noisy gradient descent systems in the overparameterized regime. In this regime the zero-loss set of global minimizers of the loss is large, and when initialized in a neighbourhood of this zero-loss set a noisy gradient descent algorithm slowly evolves along this set. In some cases this slow evolution has been related to better generalization properties. We characterize this evolution for the broad class of noisy gradient descent systems in the limit of small learning rate. Our results show that the structure of the noise affects not just the form of the limiting process, but also the time scale at which the evolution takes place. We apply the theory to Dropout, label noise and classical SGD (minibatching) noise, and show that these evolve on different two time scales. Classical SGD even yields a trivial evolution on both time scales, implying that additional noise is required for regularization. The results are inspired by the training of neural networks, but the theorems apply to noisy gradient descent of any loss that has a non-trivial zero-loss set.
Before starting my PhD I was also involved in various research projects, namely...
During my master I worked on transformers in application to dynamical systems modeling:
(with Ivan Oseledets) Deep Representation Learning for Dynamical Systems Modeling [arXiv]
(with Ivan Oseledets) Tensorized transformer for dynamical systems modeling [arXiv]
During my bachelor I studied filamentation of laser pulses:
(with D. V. Mokrousova, D. E. Shipilo, G. E. Rizaev, N. A. Panov, E. S. Sunchugasheva, A. A. Ionin, O. G. Kosareva, and L. V. Seleznev) Enhancement of third harmonic yield in fused filaments due to Gouy shift suppression, J. Opt. Soc. Am. B 37, 1406-1412 (2020) [link]
(with A.V. Shutov, D.V. Mokrousova, V.Yu. Fedorov, L.V. Seleznev, G.E. Rizaev, V.D. Zvorykin, S. Tzortzakis, and A.A. Ionin) Influence of air humidity on 248-nm ultraviolet laser pulse filamentation, Optics Letters, Vol. 44, Issue 9, 2165-2168 (2019) [link]
(with D.E. Shipilo, D.V. Mokrousova, N.A. Panov, G.E. Rizaev, E.S. Sunchugasheva, A.A. Ionin, A. Couairon, L.V. Seleznev, and O.G. Kosareva) Third-harmonic generation from regularized converging filaments, J. Opt. Soc. Am. B 36, A66-A71 (2019) [link]