DS604 - Mathematics for Machine Learning: Optimization (Spring'26)

Instructor: Pratik Jawanpuria and Pranay Sharma

TA: TBA

Time: Mondays and Thursdays, 2.00-3.30 pm

Room: TBA

Office Hours: TBA

Course Description

This course is designed to equip students with the mathematical and theoretical foundations of linear algebra and modern optimization methods that drive today’s large-scale machine learning. The emphasis will be on rigorous guarantees, general frameworks, and unifying perspectives to provide students with the theoretical toolkit to study existing algorithms and develop new ones.

Pre-requisites

A strong foundation in probability and linear algebra is expected. Some exposure to basic ML and optimization would be helpful, but not strictly necessary. Most importantly, since this is a graduate course that involves proofs, some mathematical maturity is required.

Tentative Topics

Basics - Convex sets and functions
First-order methods
- Gradient Descent
- Subgradients, subgradient calculus, and Subgradient Method
- Proximal GD, Conjugate Functions, Mirror Descent
- Stochastic Gradient Methods - SGD, Variance reduction (SAGA, SVRG), Momentum-based SGD, adaptive methods (Adagrad, Adam, AdamW)
Lagrangian-based methods
- Duality
- Dual methods
Dynamical Systems perspective (if time permits)
Online Convex Optimization (if time permits)

References

[B17] Amir Beck. "First-order Methods in Optimization." SIAM (2017).
[BV04] Boyd and Vandenberghe. “Convex optimization,” Cambridge University Press (2004).
[B15] Sébastien Bubeck. "Convex optimization: Algorithms and complexity." Foundations and Trends in Machine Learning 8.3-4 (2015).
[BCN18] Bottou, Curtis, and Nocedal. "Optimization methods for large-scale machine learning." SIAM Review (2018).
[WR22] Wright and Recht, “Optimization for Data Analysis.” Cambridge University Press (2022).
[O19] Francesco Orabona. "A modern introduction to online learning." arXiv preprint arXiv:1912.13213 (2019).
[SBC16] Su, Boyd, and Candés. “A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights.” JMLR (2016).

Some Related Courses

EE236B: Convex Optimization and EE236C: Optimization Methods for Large-Scale Systems (by Prof. Lieven Vandenberghe, UCLA)
CS 725: Convex Optimization (by Prof. Sivaraman Balakrishnan, CMU). Also, see a variant of the same course taught by Prof. Ryan Tibshirani in the past.
STAT9910-303: Large-Scale Optimization for Data Science (by Yuxin Chen from UPenn)

Lecture Notes (will be posted here)

Page updated

Google Sites

Report abuse