ELE539/COS512: Optimization for Machine Learning
Spring 2021, Mon. Wed. 09:30 am - 10:50 am
Instructor: Chi Jin Office hour: Mon. 4:00 pm - 5:00 pm
TA: Qinghua Liu Office hour: Tue. 4:00 pm - 5:00 pm
Contents: Optimization theory---algorithms and complexity analyses
Grades: 5 problem sets (60%), 1 scribe note (10%) and 1 final exam (30%). Final will take 40% for students who have not done scribe note.
No late homework.
Scribe note [sign up sheet] and [template].
*Special thanks to Tianyi Lin for his enormous help in developing the course material
Lectures
Convex Optimization
2/1. Introduction. [Jamboard][video][note] (Ch 1.1, 1.3 in Bubeck)
2/3. Black-box model, basic properties of convexity. [Jamboard][video][note] (Ch 1.2, 1.4 in Bubeck)
2/8. The ellipsoid method. [Jamboard][video][note] (Ch 2.2 in Bubeck)
2/10. Gradient descent for Lipschitz functions. [Jamboard][video][note] (Ch 3.1, 3.4.1 in Bubeck)
2/15. Gradient descent for smooth functions. [Jamboard][video][note] (Ch 3.2, 3.4.2 in Bubeck)
2/17. Nesterov's accelerated gradient descent. [Jamboard][video][note] (Ch 3.7 in Bubeck)
2/22. Lower Bounds: linear span algorithms. [Jamboard][video][note] (Ch 3.5 in Bubeck)
2/24. Lower Bounds II: general randomized algorithms. [Jamboard][video][note] [Woodworth & Srebro, 2017]
3/3. Mirror Descent II. [Jamboard][video] (Ch 4.2, 4.3 in Bubeck)
3/8. Stochastic Gradient Descent. [Jamboard][video][note] (Ch 6.1, 6.2 in Bubeck)
3/10. Stochastic Variance Reduced Gradient. [Jamboard][video][note] [Johnson & Zhang, 2013]
Nonconvex Optimization
3/17. Nonconvex optimization overview, power methods. [Jamboard][video][note]
3/31. Escaping saddle points II. [Jamboard][video][note] [Jin et al., 2019]
4/5. Examples, optimal rates, high-order saddle points. [Jamboard][video][note]
4/7. Lecture cancelled.
4/12. (Guest lecture by Cong Fang) stochastic nonconvex optimization, SPIDER. [Jamboard][video][note] [Fang et al., 2018]
4/14. Nonsmooth nonconvex optimization [Jamboard][video][note] [Davis & Drusvyatskiy, 2018]
Minimax Optimization
4/19. Minimax Optimization, gradient descent ascent for Lipschitz functions. [Jamboard][video][note]
4/21. Gradient descent ascent for smooth functions. [Jamboard][video][note]
4/26. Extragradient method, last iterate convergence, and optimal rates. [Jamboard][video][note]
4/28. Nonconvex minimax optimization. [Jamboard][video][note]
Homework
Homework 1 (due on Mar. 3, 11:59 PM on Blackboard)
Homework 2 (due on Mar. 16, 11:59 PM on Blackboard)
Homework 3 (due on Mar. 30, 11:59 PM on Blackboard)
Homework 4 (due on Apr. 22, 11:59 PM on Blackboard)
Homework 5 (due on May 4, 11:59 PM on Blackboard)
Schedule (weekly basis)
Convex Optimization:
Intro, black-box model
Subgradient methods, gradient methods
Nesterov's acceleration, momentum
Lower bounds [Homework 1 due]
Mirror descent
Stochastic algorithms, variance reduction [Homework 2 due]
Nonconvex Optimization
Eigenvector problems, power methods, Lanczos algorithm
Finding stationary point [Homework 3 due]
Escaping saddle point
Nonsmooth (weakly convex) optimization [Homework 4 due]
Minimax Optimization
Convex-concave: gradient descent ascent, extragradient method
Gradient descent with max-oracle [Homework 5 due]
Recommended Textbook
Convex Optimization: Algorithms and Complexity, by Sebastian Bubeck
Lecture Notes: Optimization for Machine Learning, by Elad Hazan
Convex Optimization, by Stephen Boyd and Lieven Vandenberghe
Reference Readings
Last iterate analysis on GD/SGD for Lipschitz convex functions: [Shamir & Zhang, 2013].
Optimal algorithms in the stochastic gradient setting for strongly convex and smooth functions: [Ghadimi & Lan, 2013a] [Ghadimi & Lan, 2013b]
Optimal algorithms for the finite-sum problems and the matching lower bounds: [Lin et al., 2018] [Allen-Zhu, 2017] [Woodworth & Srebro, 2016]
OGDA and Extragradient for convex-concave optimization: [Mokhtari et al., 2019a][Mokhtari et al., 2019b]
Optimal algorithm for strongly-convex-strongly-concave optimization: [Lin et al., 2020]
Related Courses
Elad Hazan, Optimization for Machine Learning
Yuxin Chen, Large-Scale Optimization for Data Science