Spring 26: Optimization for Data Science

Optimization has become the engine for training modern artificial intelligence systems, including LLM, reinforcement learning, generative models. This course provides in-depth treatment of classical and modern optimization methods that are relevant in statistics, machine learning, and data science. We start with the role of optimization in the process of learning from data, and then we discuss the basic theory of convex optimization. Based on this, we present and analyze (stochastic) gradient-based methods, modern nonconvex optimization, and optimizers used in deep learning, reinforcement learning, and language models, as well as applications to statistical and data science problems. The emphasis is on the intuition for modeling, design principles of algorithms, theoretical performance bounds, and the mathematical tools for analysis. The objective is to provide students with a deep understanding of how and why optimization algorithms work, along with their limitations. This knowledge will assist in modeling from an optimization perspective, choosing appropriate algorithms for specific applications, and provide preliminary guidance on practical implementation.

Lecturer: Prof. Yifan Hu

Course Number: 16:960:690:01 and 16:954:694:02

Teaching assistant: to be assigned.

Classes: Thu 12:10pm - 15:10pm

Location: PH-111

Prerequisites: A solid background in analysis and linear algebra; some background in theoretical computer science (computational complexity, analysis of algorithms); the ability to understand mathematical proofs.

Tentative Schedule

Week 1 Jan 22nd, Introduction to Optimization
- - What is optimization
  - Motivating examples from statistics, data science, and machine learning
  - Constrained and unconstrained optimization
Week 2 Jan 29th, Theory of Convexity
- - Convex sets and functions
  - Strong convexity and smoothness
  - Geometric interpretations
Week 3 Feb 5th, Stochastic Optimization
- - Sample average approximation/empirical risk minimization
  - Stability and generalization
Week 4 Feb 12nd, Stochastic Gradient Descent
- - Convergence analysis and sample complexity
  - Algorithmic stability of SGD
  - Implementation tricks
Week 5 Feb 19th, (Stochastic) Coordinate Descent and Frank-Wolfe Algorithms
Week 6 Feb 26th, Stochastic Mirror Descent and Geometric-based algorithms
- - Nonsmoothness
  - Bregman divergence
  - Hidden convexity
Week 7 Mar 5th, Optimizer in Deep Learning and Variance Reduction Gradient Methods
- - From SGD to Adam-mini
  - Control variate techniques
  - SARAH, SPIDER, STORM, PAGE algorithms
Week 8 Mar 12nd, Matrix Optimizer
- - Muon and its variants
Week 9 Mar 19th, Spring break
Week 10 Mar 26th, Modern Nonconvex Optimization
- - Landscape: gradient dominance conditions, invexity, hidden convexity
  - Stationary convergence
Week 11 Apr 2nd, Stochastic Minimax, Bilevel, Contextual Optimization
- - Adversarial generative network, hyperparameter tuning, neural architecture search
  - Biased Gradient Methods
  - Model misspecification
Week 12 Apr 9th, Distributionally Robust Optimization
- - Equivalence between square-root Lasso, regularized logistic regression and distributionally robust optimization
  - Dual form of distributionally robust optimization
  - Computation and generalization
Week 13 Apr 16th, Application I: Causal Learning as Optimization
- - Causal invariance learning
  - Causal inference: CATE/Uplifting/Instrumental Variable Regression
  - DAG learning
Week 14 Apr 23rd, Applications II: Policy Optimization in Reinforcement Learning
- - Policy gradient theorem
  - Function approximation
  - Policy-based RL algorithms
Week 15 Apr 30th, Applications III: Variational Inference and LLM as Optimization
- - Particle gradient methods
  - Reinforcement learning with human feedback
  - Fine-tuning

Evaluation and Grading

There will be one quiz and a course project. The quiz consists of 20% of the final grade and the project consists of 80% of the final grade. There will be 5 bonus take-home questions. Each could contribute to 5% of the final grade.

Course Project

The course project can be conducted alone or within a group of 2 or 3 students. The project can focus on designing new optimization algorithms, theoretical analysis of optimization algorithms, or comprehensive implementation of the algorithms on applications. A list of projects will be provided to registered students. Students can also design their own projects (please discuss with the TA and lecturer).

Tentative Projects List

To be released later.

Reading Material

Optimization for Data Science, by Zebang Shen, Niao He, Bernd Gartner, and Martin Jaggi.
Convex Optimization, by Stephen Boyd and Lieven Vandenberghe.
Convex Optimization: Algorithms and Complexity, by Sébastien Bubeck.
High-Dimensional Statistics: A Non-Asymptotic Viewpoint, by Martin J. Wainwright.
Optimization Methods for Large-Scale Machine Learning, by Leon Bottou et al.

Page updated

Google Sites

Report abuse