Mathematical Foundations of Machine Learning
Time & Location: Mondays 14:15-16:00, seminar room S11 (C6H41), C-Bau Campus Morgenstelle
Format: 90 minutes weekly lecture, 3 ECTS points, a few exercises for self study will be provided
Contents: The goal of this lecture is to provide an overview of some of the theoretical challenges arising from the empirical successes of neural networks in machine learning.
In the first part of the course, we will introduce kernel methods and explain the standard view on generalization via the bias variance tradeoff. The second part will then study properties of neural networks, and we will see that the classical theory fails to explain their performance because the optimization of neural nets is non-convex and their number of parameters often exceeds the size of the training data (overparametrization). Then we will see that while it is extremely difficult to understand neural networks of finite size, their limiting behaviour as the width becomes infinite is tractable and relates back to kernel methods through the so-called neural tangent kernel. If time permits, we will study the relation of generalization and overparametrization in the context of simple models to gain some insight into this new regime where neural networks operate.
Prerequisites: Knowledge of basic probability theory and analysis. Some knowledge of functional analysis can be helpful.
Course material: Lecture notes can be found on Moodle (math section).
Contact: If you have any questions or a scheduling conflict, please contact me (sbuchholz at tue dot mpg dot de)
Literature:
Support vector machines, I. Steinwart, A. Christmann, Springer Science & Business Media, 2008
Optimal rates for the regularized least-squares algorithm, A. Caponnetto, E. De Vito, Foundations of Computational Mathematics 7, 2007
The Principles of Deep Learning Theory, D. A. Roberts, S. Yaida, B. Hanin, arXiv:2106.10165, 2021
Deep learning: a statistical viewpoint, P. Bartlett, A. Montanari, A. Rakhlin, Acta numerica, 2021
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation, M, Belkin, Acta Numerica, 2021