Optimization has become the engine for training modern artificial intelligence systems, including LLM, reinforcement learning, generative models. This course provides in-depth treatment of classical and modern optimization methods that are relevant in statistics, machine learning, and data science. We start with the role of optimization in the process of learning from data, and then we discuss the basic theory of convex optimization. Based on this, we present and analyze (stochastic) gradient-based methods, modern nonconvex optimization, and optimizers used in deep learning, reinforcement learning, and language models, as well as applications to statistical and data science problems. The emphasis is on the intuitions and design principles for these algorithms, on theoretical performance bounds, and on the mathematical tools and techniques for analysis. The objective is to provide students with a deep understanding of how and why optimization algorithms work, along with their limitations. This knowledge will assist in choosing appropriate algorithms for specific applications and provide preliminary guidance on practical implementation.
Lecturer: Prof. Yifan Hu
Course Number: 16:960:690:01 and 16:954:694:02
Teaching assistant: to be assigned.
Classes: Thu 12:10pm - 15:10pm
Location: PH-111
Prerequisites: A solid background in analysis and linear algebra; some background in theoretical computer science (computational complexity, analysis of algorithms); the ability to understand mathematical proofs.
Tentative Schedule
Week 1 Jan 22nd, Introduction to Optimization and connection to Statistics/DS
Week 2 Jan 29th, Theory of Convex Functions
Week 3 Feb 5th, Stochastic Optimization and Sample Average Approximation
Week 4 Feb 12nd, (Stochastic) Gradient Descent
Week 5 Feb 19th, Coordinate Descent
Week 6 Feb 26th, Frank-Wolfe Algorithms
Week 7 Mar 5th, Variance Reduction Gradient Methods
Week 8 Mar 12nd, Biased Gradient Methods
Week 9 Mar 19th, Spring break
Week 10 Mar 26th, Modern Nonconvex Optimization for Machine Learning
Week 11 Apr 2nd, (Stochastic) Mirror Descent and Reinforcement Learning
Week 12 Apr 9th, Optimizers in Deep Learning
Week 13 Apr 16th, Matrix Optimizers in Training Language Models
Week 14 Apr 23rd, Statistical and Data Science Applications I: Variational Inference/Particle Gradient Methods
Week 15 Apr 30th, Statistical and Data Science Applications II: Robust Learning/Statictics
Evaluation and Grading
There will be one quiz and a course project. The quiz consists of 20% of the final grade and the project consists of 80% of the final grade. There will be 5 bonus take-home questions. Each could contribute to 5% of the final grade.
Course Project
The course project can be conducted alone or within a group of 2 or 3 students. The project can focus on designing new optimization algorithms, theoretical analysis of optimization algorithms, or comprehensive implementation of the algorithms on applications. A list of projects will be provided to registered students. Students can also design their own projects (please discuss with the TA and lecturer). Successful projects could have the potential for submission to ML conference proceedings, optimization or operations research journals.
Tentative Projects List
To be released later.
Reading Material
Optimization for Data Science, by Zebang Shen, Niao He, Bernd Gartner, and Martin Jaggi.
Convex Optimization, by Stephen Boyd and Lieven Vandenberghe.
Convex Optimization: Algorithms and Complexity, by Sébastien Bubeck.
High-Dimensional Statistics: A Non-Asymptotic Viewpoint, by Martin J. Wainwright.
Optimization Methods for Large-Scale Machine Learning, by Leon Bottou et al.