Ying Cui - INDENG 290

INDENG 290: Stochastic Optimization for Machine Learning

Course description: The purpose of this course is to provide graduate students with the fundamental understanding of the basic theory, models and algorithms for stochastic optimization, and with illustrations of how the theory and methods can be applied to many problems in contemporary machine learning and related disciplines. In the process, students will also learn about optimization under uncertainty and related analytical tools. Mathematical rigor is emphasized throughout the course. Students are expected to be interested in such rigor and willing to learn and practice it.

Prerequisites: Students should have completed a foundational course in optimization such as INDENG 262A or its equivalent. While not mandatory, students are strongly encouraged to concurrently enroll in INDENG 262B if they have not already completed it.

Lecture: (Spring 2025) Monday, Wednesday 10:00am - 11:30am, Etcheverry 1174

References:

A. Shapiro, D. Dentcheva, A. Ruszczynski. Lecture Notes on Stochastic Programming, Modeling and Theory. SIAM 2009.
J. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer 2011.
F. Bach. Learning Theory from First Principles. MIT Press 2024.
J. Duchi. Introductory Lectures on Stochastic Optimization.

Outline:

Week 1:

Introduction to modeling uncertainty, robustness and fairness in machine learning

Models for two-stage stochastic programs

Value of information

Week 2:

Decomposable structures of two-stage stochastic programs: coupling variables and coupling constraints

Decomposition methods for coupling constraints: Lagrangian relaxation and progressive hedging

Week 3:

Decomposition methods for coupling variables: Benders decomposition

Sample average approximations (i): bias and asymptotic performance

Week 4:

Concentration inequalities

Week 5:

Sample average approximations (ii): non-asymptotic performance

Uniform generalization error, Rademacher complexity

Week 6:

Stochastic gradient methods (i): convergence and sample complexity

Week 7:

Stochastic gradient methods (ii): Adaptive stepsizes and variance reduction

Week 8:

Overparameterization (i): PL conditions and neural tangent kernels

Week 9:

Overparameterization (ii): implicit bias and double descent

Week 10:

Robust optimization (i): tractable formulations, relationship with regularizers in ML

Robust optimization (ii): adversarial machine learning

Week 11:

Distributionally robust optimization (i): formulations

Week 12:

Distributionally robust optimization (ii): duality and tractability

Week 13:

Distributionally robust optimization (iii): statistical properties and applications in ML

Week 14:

Wasserstein gradient flow

Page updated

Google Sites

Report abuse