Instructor: Pratik Jawanpuria and Pranay Sharma
TA: TBA
Time: Mondays and Thursdays, 2.00-3.30 pm
Room: TBA
Office Hours: TBA
Course Description
This course is designed to equip students with the mathematical and theoretical foundations of linear algebra and modern optimization methods that drive today’s large-scale machine learning. The emphasis will be on rigorous guarantees, general frameworks, and unifying perspectives to provide students with the theoretical toolkit to study existing algorithms and develop new ones.
Pre-requisites
A strong foundation in probability and linear algebra is expected. Some exposure to basic ML and optimization would be helpful, but not strictly necessary. Most importantly, since this is a graduate course that involves proofs, some mathematical maturity is required.
Tentative Topics
Basics - Convex sets and functions
First-order methods
Gradient Descent
Subgradients, subgradient calculus, and Subgradient Method
Proximal GD, Conjugate Functions, Mirror Descent
Stochastic Gradient Methods - SGD, Variance reduction (SAGA, SVRG), Momentum-based SGD, adaptive methods (Adagrad, Adam, AdamW)
Lagrangian-based methods
Duality
Dual methods
Dynamical Systems perspective (if time permits)
Online Convex Optimization (if time permits)
References
[B17] Amir Beck. "First-order Methods in Optimization." SIAM (2017).
[BV04] Boyd and Vandenberghe. “Convex optimization,” Cambridge University Press (2004).
[B15] Sébastien Bubeck. "Convex optimization: Algorithms and complexity." Foundations and Trends in Machine Learning 8.3-4 (2015).
[BCN18] Bottou, Curtis, and Nocedal. "Optimization methods for large-scale machine learning." SIAM Review (2018).
[WR22] Wright and Recht, “Optimization for Data Analysis.” Cambridge University Press (2022).
[O19] Francesco Orabona. "A modern introduction to online learning." arXiv preprint arXiv:1912.13213 (2019).
[SBC16] Su, Boyd, and Candés. “A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights.” JMLR (2016).
Some Related Courses
EE236B: Convex Optimization and EE236C: Optimization Methods for Large-Scale Systems (by Prof. Lieven Vandenberghe, UCLA)
CS 725: Convex Optimization (by Prof. Sivaraman Balakrishnan, CMU). Also, see a variant of the same course taught by Prof. Ryan Tibshirani in the past.
STAT9910-303: Large-Scale Optimization for Data Science (by Yuxin Chen from UPenn)
Lecture Notes (will be posted here)