INDENG 290: Stochastic Optimization for Machine Learning
INDENG 290: Stochastic Optimization for Machine Learning
Course description: The purpose of this course is to provide graduate students with the fundamental understanding of the basic theory, models and algorithms for stochastic optimization, and with illustrations of how the theory and methods can be applied to many problems in contemporary machine learning and related disciplines. In the process, students will also learn about optimization under uncertainty and related analytical tools. Mathematical rigor is emphasized throughout the course. Students are expected to be interested in such rigor and willing to learn and practice it.
Prerequisites: Students should have completed a foundational course in optimization such as INDENG 262A or its equivalent. While not mandatory, students are strongly encouraged to concurrently enroll in INDENG 262B if they have not already completed it.
Lecture: (Spring 2025) Monday, Wednesday 10:00am - 11:30am, Etcheverry 1174
References:
A. Shapiro, D. Dentcheva, A. Ruszczynski. Lecture Notes on Stochastic Programming, Modeling and Theory. SIAM 2009.
J. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer 2011.
F. Bach. Learning Theory from First Principles. MIT Press 2024.
J. Duchi. Introductory Lectures on Stochastic Optimization.
Outline:
Week 1:
Introduction to modeling uncertainty, robustness and fairness in machine learning
Models for two-stage stochastic programs
Value of information
Week 2:
Decomposable structures of two-stage stochastic programs: coupling variables and coupling constraints
Decomposition methods for coupling constraints: Lagrangian relaxation and progressive hedging
Week 3:
Decomposition methods for coupling variables: Benders decomposition
Sample average approximations (i): bias and asymptotic performance
Week 4:
Concentration inequalities
Week 5:
Sample average approximations (ii): non-asymptotic performance
Uniform generalization error, Rademacher complexity
Week 6:
Stochastic gradient methods (i): convergence and sample complexity
Week 7:
Stochastic gradient methods (ii): Adaptive stepsizes and variance reduction
Week 8:
Overparameterization (i): PL conditions and neural tangent kernels
Week 9:
Overparameterization (ii): implicit bias and double descent
Week 10:
Robust optimization (i): tractable formulations, relationship with regularizers in ML
Robust optimization (ii): adversarial machine learning
Week 11:
Distributionally robust optimization (i): formulations
Week 12:
Distributionally robust optimization (ii): duality and tractability
Week 13:
Distributionally robust optimization (iii): statistical properties and applications in ML
Week 14:
Wasserstein gradient flow