Introduction to convex and nonconvex optimization. Optimality conditions for nonconvex optimization problems. Gradient descent methods with various stepsize rules. Global rate of convergence. Some perspectives. [Here is the video for lecture 1]
[Here are the slides for lecture 1 & 2]
Stochastic gradient method and adaptive variants (AdaGrad).[Here is the video for lecture 2]
Momentum/acceleration methods and Adam. Second-order type methods.
[Here is the video for lecture 3]
[Here are the slides for lecture 3 & 4]
General frameworks: probabilistic models, adaptive batch size; time permitting, subspace methods.
[Here is the video for lecture 4]
[Here are the slides for lecture 5 & 6]
[Here are the slides for lecture 7 & 8]
The slides for the different groups' presentations are available and can be viewed here.
Léon Bottou, Frank E. Curtis, and Jorge Nocedal. Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2):223–311, 2018.
3. Nocedal and Wright, Numerical Optimization, (Springer). [standard nonlinear optimization textbook]
4. Amir Beck – First order methods (SIAM).
Practicals/Exercises: some theoretical exercises/additional proofs and coding (first assignment). Exploration of stochastic gradient and variants on training DNNs (second assignment). Research paper reading and summary (third assignment).
Tutorial 1 solutions | Colab notebook solutions tutorial 1
See here for a recording of practical 1.
See here for a recording of practical 2.
Tutorial 3. See here for a recording of practical 3.