Course 3 - Prof. Cartis

Large-scale Optimization for Deep Learning

Lectures

Introduction to convex and nonconvex optimization. Optimality conditions for nonconvex optimization problems. Gradient descent methods with various stepsize rules. Global rate of convergence. Some perspectives. [Here is the video for lecture 1]

[Here are the slides for lecture 1 & 2]

Stochastic gradient method and adaptive variants (AdaGrad).[Here is the video for lecture 2]

Momentum/acceleration methods and Adam. Second-order type methods.

[Here is the video for lecture 3]

[Here are the slides for lecture 3 & 4]

General frameworks: probabilistic models, adaptive batch size; time permitting, subspace methods.

[Here is the video for lecture 4]

[Here are the slides for lecture 5 & 6]

[Here are the slides for lecture 7 & 8]

The slides for the different groups' presentations are available and can be viewed here.

Textbooks/Surveys

3. Nocedal and Wright, Numerical Optimization, (Springer). [standard nonlinear optimization textbook]

4. Amir Beck – First order methods (SIAM).

Software (we will use Google co-lab)

Praticals/tutorials

Practicals/Exercises: some theoretical exercises/additional proofs and coding (first assignment). Exploration of stochastic gradient and variants on training DNNs (second assignment). Research paper reading and summary (third assignment).

Tutorial 1 | Colab notebook for Tutorial 1

Tutorial 1 solutions | Colab notebook solutions tutorial 1

See here for a recording of practical 1.

Tutorial 2 | Colab Notebook 2

See here for a recording of practical 2.

Tutorial 3. See here for a recording of practical 3.