This course provides an in-depth theoretical treatment of classical and modern optimization methods that are relevant in statistics, machine learning, and data science. We start with the role of optimization in the process of learning from data, and then we discuss the basic theory of convex optimization. Based on this, we present and analyze (stochastic) gradient-based methods, modern nonconvex optimization, and optimizers used in deep learning, reinforcement learning, and language models, as well as applications to statistical and data science problems. The emphasis is on the intuitions and design principles for these algorithms, on theoretical performance bounds, and on the mathematical tools and techniques for analysis. The objective is to provide students with a deep understanding of how and why optimization algorithms work, along with their limitations. This knowledge will assist in choosing appropriate algorithms for specific applications, although the course does not primarily aim to offer practical implementation guidance.
Lecturer: Prof. Yifan Hu
Teaching assistant: to be assigned.
Classes: Thu 12:10pm - 15:10pm
Location: PH-111
Prerequisites: A solid background in analysis and linear algebra; some background in theoretical computer science (computational complexity, analysis of algorithms); the ability to understand and write mathematical proofs.
Tentative Schedule
Week 1 Jan 22nd, Introduction to Optimization
Week 2 Jan 29th, Theory of Convex Functions
Week 3 Feb 5th, Stochastic Optimization and Sample Average Approximation
Week 4 Feb 12nd, (Stochastic) Gradient Descent
Week 5 Feb 19th, Coordinate Descent
Week 6 Feb 26th, Frank-Wolfe Algorithms
Week 7 Mar 5th, Variance Reduction Gradient Methods
Week 8 Mar 12nd, Biased Gradient Methods (Multi-level Monte Carlo)
Week 9 Mar 19th, Spring break
Week 10 Mar 26th, Modern Nonconvex Optimization
Week 11 Apr 2nd, (Stochastic) Mirror Descent and reinforcement learning
Week 12 Apr 9th, Optimizers in deep learning
Week 13 Apr 16th, Matrix Optimizers in training language models
Week 14 Apr 23rd, Statistical and Data Science Applications I: Variational Inference
Week 15 Apr 30th, Statistical and Data Science Applications II: Robust Learning
Evaluation and Grading
There will be one quiz and a course project. The quiz consists of 20% of the final grade and the project consists of 80% of the final grade.
Course Project
The course project can be conducted alone or within a group of 2 or 3 students. The project can focus on designing new optimization algorithms, theoretical analysis of optimization algorithms, or comprehensive implementation of the algorithms on applications. A list of projects will be provided to registered students. Students can design their own projects after discussions with the TA and lecturer.
Tentative Projects List
To be released later.
Reading Material
Convex Optimization, by Stephen Boyd and Lieven Vandenberghe
Convex Optimization: Algorithms and Complexity, by Sébastien Bubeck
High-Dimensional Statistics: A Non-Asymptotic Viewpoint, by Martin J. Wainwright
Optimization Methods for Large-Scale Machine Learning, by Leon Bottou et al