ES 645

Optimization for Machine Learning

Course Description

Optimization forms one of the main backbones of machine learning. This course aims to familiarize students with the various optimization techniques. A high-level view of the topics is as follows:

Introduction – linear systems, iterative methods e.g., Krylov, numerical conditioning, Taylor series, convexity, and its properties.
Continuous and convex functions –, gradient descent, stochastic gradient descent, Nesterov’s methods, mirror descent, multiplicative weights update, Newton’s method, conjugate gradient methods, coordinate descent, Frank Wolfe methods for constrained optimization. SGD with averaging (SGA and SAG). Various duality notions and connections with optimality (Lagrangian Dual, Wolfe Dual, Fenchel Dual). Case study of SVM – choice of working with primal or dual form. Examples of convex relaxations (matrix completion using nuclear norm, sparse approximation using L1 regularization).
Continuous and nonconvex functions– nonconvex projected gradient descent, alternating minimization, stochastic optimization. Applications to sparse recovery, low-rank matrix recovery, robust regression.
Randomized algorithms aiding optimization – Johnson Lindenstrauss random projection, sampling/sketching for least square regression, and matrix factorization.
Discrete optimization – submodularity (convex and concave properties), submodular optimization (minimization and maximization), Lovasz extension, applications to data and feature selection, diversity maximization, etc.
Multilayered Perceptrons: MLP and gradient computation, universality; automatic differentiation (forward and reverse modes), feed forward architectures, recurrent architectures. Neural Tangent Kernels.
Optimization and learning – loss landscape of NNs, properties of SGD, various modern optimization algorithms, e.g., ADAM, AdaGrad, the inductive bias of gradient descent; we will also survey recent papers about the interplay of optimization (both the objective value and the trajectory) and generalization in deep learning.

Examples of papers to be covered here (more to be added)

Logistics:

Instructor: Anirban Dasgupta, Office: AB 6/407c. Please email for appointment.

Teaching Assistants : Shrutimoy Das

Class : TBD

Grading Policy

Regular quizzes through the semester – 30%
Three exams -- 30%
Homework – 10%
Project – 30%.

We will strictly follow the honor policy of IITGN. Collaboration in homeworks is allowed unless stated explicitly. Collaborations among class participants is allowed in homeworks, but everyone needs to write down their own answers and code as well as generate their own plots. Anyone with you discuss ideas with when solving the homework needs to be mentioned clearly. You are not expected to use Google or any other source for finding answers to homework questions unless explicitly allowed.

Textbooks and References:

Algorithms for Convex Optimization by Niseeth Vishnoi (pdf available on book page).
Convex Optimization by Stephen Boyd and Lieven Vandenberghe (pdf available on book page).
Course notes on Optimization for ML by Gabriel Peyre.
[1712.07897] Non-convex Optimization for Machine Learning by Prateek Jain, Purushottam Kar.
[2202.00132] Submodularity In Machine Learning and Artificial Intelligence
Learning with Submodular Functions: A Convex Optimization Perspective.
https://probml.github.io/pml-book/book2.html , Chapter on submodularity.
Numerical Optimization. J. Nocedal and S. J. Wright, Springer Series in Operations Research, Springer-Verlag, New York
https://distill.pub/2017/momentum/ – on why momentum works.

Expected prerequisites and some background material

While there are no formal prerequisites, we expect a background knowledge equivalent to the Machine Learning course. We also expect you have done a linear algebra and a probability course. If you have doubts about whether this course is for you, please contact the instructor. You can use the following material to pick up the necessary linear algebra and probability background.

Gilbert Strang's lectures on linear algebra.
3Blue1Brown's series on linear algebra.
Basics of probability and distribution-- from Mathematics for Machine Learning book.
Probability and statistics course from MIT-OCW.
Python for Data Science book by Jake Vanderplas.

Homeworks

There will be 4-5 homework assignments, spread out throughout the semester. Homeworks will be a mix of theoretical and implementation oriented questions.