MT875

MT 875: Graduate topics in deep learning theory

Fall 2023

Spacetime coordinates: WF, 2-3:15, Maloney 560

Instructor: Eli Grigsby (she/her)

E-mail address: Delete all but one of the j's and replace "at" and "dot" with their symbols in the following expression: grigsbyjjjjjjjj(at)bc(dot)edu. (IMPORTANT: If you accidentally delete ALL the j's, your e-mail will accidentally go to someone that isn't me.)

Office hours: Maloney 522,  by appointment (feel free to sign up for multiple adjacent slots, and e-mail me with your rough availability if you don't see a time that works for you)

Course description: This is a course on the mathematical and statistical foundations of learning theory, with an emphasis on applications to neural networks and gradient-based learning algorithms. As my background is in geometric topology, I will focus primarily on the portions of the subject where geometry, topology, and combinatorics play a prominent role.


The course will be divided into thirds:


1/3)  Intro to supervised learning, computational learning theory, PAC learning framework: I will follow the first few chapters of Kearns and Vazirani's Introduction to Computational Learning.* Our goal here will be to understand how the classical notions of VC dimension, Rademacher complexity, and their cousins provide a means for quantifying the degree to which learning with a particular function class and finite sample data affects the ability of the learned model to generalize to unseen data drawn from the same distribution

2/3) Feedforward ReLU neural networks as a function class: I will prove basic properties of this function class (e.g., that it is precisely the class of finite piecewise linear functions and hence a universal approximator) and develop the foundations for calculating some of the classical statistical notions of complexity, along with some newer geometric and topological notions of complexity. We will also dig into an established relationship to tropical geometry.

3/3) Additional topics, as time permits: Neural tangent kernel and neural networks in the infinite-width limit, overparameterized networks and double descent, geometry of the loss landscape, the role of step size and noise in gradient dynamics, reinforcement learning, the role of attention in transformer networks for large language models


Prerequisites: No prior background in machine learning or learning theory will be assumed, but I will assume a degree of mathematical maturity. It would be helpful if you've taken the introductory grad G/T sequence or its equivalent (algebraic and differential topology).


Course work: I will ask all registered students to choose a topic and either 1) produce one detailed lecture or tutorial to be given in class or posted on-line, or 2) produce some experiments, along with a visualization and brief blurb to be presented in class or posted on-line.

I include below a list of references I plan to use this semester. If I add more later, I will do so at this google doc:


PAC Learning, Generalization, & Statistical Notions of Complexity:


Basics of supervised learning and geometry of ReLU networks:


Neural tangent kernel and Gaussian processes:


Tropical geometry and ReLU networks:


Implicit regularization and geometry of loss landscape:


Attention & Transformer networks:

Other useful (for me) background references on geometric topology:

*These texts may or may not be available in electronic form! Email me for assistance if you have trouble obtaining them.

What we actually did:

Exercises that may be useful for learning this material will appear here.