This course will cover basic mathematical ideas for understanding the training and application of deep learning. Topics include random initializations, (stochastic) gradient descent, momentum and adaptivity, batch normalization, Hessian, neural tangent kernel, implicit regularization, adversarial examples, calibration, transformers. Students will get a basic mathematical understanding for key ideas in deep learning and see their effects through labs.
Office hour: Wednesdays 1-2 pm, at LSRC D226
Yufa Zhou yufa.zhou@duke.edu
Office hour: Thursdays 3-4 pm
The first half of the course will be focused on more "standard" materials (in the sense that they are at least 5 years old) in theory of deep learning, including random initialization, optimizers, optimization landscape, neural tangent kernel, mean-field theory, basic generalization, implicit regularization and double descent. The second half will talk about some more recent topics.
See detailed schedule in calendar. Although the second half is very likely to change.
This is a version of the course that focuses more on intuitive understanding (heuristics) rather than rigorous proofs. Different from traditional theory of deep learning course, the students are expected to train some simple models and understand how choice of hyperparameters influence the training behavior.