Arthur Jacot

Assistant Professor at the Courant Institute of Mathematical Sciences, NYU. I was previously a PhD student at the EPFL under the supervision of Clément Hongler.

My goal is to develop new mathematical concepts and tools to describe the training dynamics of Deep Neural Networks, with the goal of developing a Theory of Deep Learning that would change the way we train and develop AI models. The project I am currently most excited about, is showing that DNNs are implementing a computational version of Occam's razor: trying to find the fastest algorithm/circuit that fits the training data. This would explain the incredible statistical power of DNNs, and make them more interpretable, by revealing the small underlying circuit.

I am also interested in feature learning, in particular the emergence of low-dim. representations under weight decay, and identifying the different regimes of DNN training, with the goal of obtaining a complete phase diagram capturing the NTK regime, active regime and more.

If you are interested in working with me, you should apply to the Courant's graduate programand mention my name in your application. If you are already a student at NYU, please contact me by email with your CV and a description of your interests, I have many potential projets to work on, some more mathematical, some more empirical.

Contact: arthur.jacot@nyu.edu

Selected Publications ( Google Scholar ):

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets, Arthur Jacot, 2025. [arXiv link]
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape, Ioannis Bantzis, James Simon, Arthur Jacot, 2025. [arXiv link]
Single Hidden Layer Diffusion Models Provably Learn Simple Low-Dimensional Structures, Nicholas Boffi, Arthur Jacot, Stephen Tu, Ingvar Ziemann, ICLR 2025. [conference paper]
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse, Arthur Jacot, Peter Súkeník, Zihan Wang, Marco Mondelli, ICLR 2025 (oral). [conference paper]
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning, Arthur Jacot, Seok Hoan Choi, Yuxiao Wen, ICLR 2025. [arXiv link]
Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets, Arthur Jacot, Alexandre Kaiser, CPAL 2025 (oral). [arXiv link]
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes, Zhenfeng Tu, Santiago Aranguri, Arthur Jacot, NeurIPS 2024. [arXiv link]
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning, Yuxiao Wen, Arthur Jacot, ICML 2024. [conference paper] [arXiv link]
Implicit bias of SGD in L2-regularized linear DNNs: One-way jumps from high to low rank, Zihan Wang, Arthur Jacot, ICLR 2024. [conference paper] [arXiv link]
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff, Arthur Jacot, NeurIPS 2023. [conference paper] [arXiv link]
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions, Arthur Jacot, ICLR 2023 (spotlight). [conference paper] [arXiv link]
Feature Learning in L2-regularized DNNs: Attraction/Repulsion and Sparsity, Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel, NeurIPS 2022. [arXiv link]
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity, Arthur Jacot, François Ged, Franck Gabriel, Berfin Simsek, Clément Hongler, 2022. [arXiv link]
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances, Berfin Simsek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea, ICML 2021. [arXiv link]
Implicit regularization of Random Feature Models, Arthur Jacot, Berfin Simsek, Francesco Spadaro, Clément Hongler, Franck Gabriel, ICML 2020. [Conference Paper] [arXiv link]
Scaling description of generalization with number of parameters in deep learning, Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart, Journal of Statistical Mechanics: Theory and Experiment, Volume 2020, February 2020. [Journal link] [arXiv link]
Neural Tangent Kernel: Convergence and Generalization in Neural Networks, Arthur Jacot, Franck Gabriel, Clément Hongler. NeurIPS 2018 (spotlight). [conference paper (8-page version)] [3-minute video] [spotlight slides] [spotlight video], [arXiv link (full version)]

Prize:

2025 AMR Prize in the Mathematics of Artificial Intelligence.
2022-2024 AI 2000 Most Influential Scholar Award in Theory.
2023 Prix EPFL de Doctorat (EPFL PhD thesis prize).
2021 SwissMAP Innovator Prize.

Recent Talks:

SIAM annual meeting, Montreal (July 2025).
Conference on Parsimony and Learning, Stanford (March 2025).
SIAM Mathematics of Data Science, Atlanta(October 2025).
Institute of Science and Technology Austria (July 2024).
Statistics and Learning Theory in the Era of Artificial Intelligence, Oberwolfach (June 2024).
Two sigma, New York (June 2024).
DIMACS workshop, Rutgers University (June 2024).
Optimization Seminat, UPenn (Nov. 2023).
Princeton ML Theory summer school (June 2023).
ICLR spotlight presentation, Kigali Convention Center (Mai 2023).
Phys4ML, Aspen Center for Physics (Feb 2023).

Videos:

News Articles:

A New Link to an Old Model Could Crack the Mystery of Deep Learning, Quanta Magazine.
A Deeper Understanding of Deep Learning, Communications of the ACM.

Page updated

Google Sites

Report abuse