CS 6301: Optimization in Machine Learning

About the Course

Days and Time: Monday & Wednesday 4:00pm - 5:15pm

Location: AD 2.232

Office Hours: Wednesdays 3 pm - 3:50 pm (my office: ECSS 3.405), Monday 3 pm - 3:50 pm (by appointment)

Along with this web-page, the lecture slides, assignments, assignment solutions and student projects will be posted on this page: https://github.com/rishabhk108/OptimizationML

Link to Coursebook

Continuous and Discrete Optimization are two important pillars of machine learning. Continuous optimization typically occurs in learning model parameters while discrete optimization problems appear in inference and auxiliary tasks such as feature selection, data subset selection, model compression etc.

In the first part of this course, we will cover continuous optimization with applications in machine learning. Topics to discuss will include Convexity, Gradient Methods, Proximal algorithms, Stochastic and Online Variants of mentioned methods, Coordinate Descent Methods, Subgradient Methods, Non-Convex Optimization, Frank Wolfe, Accelerated Methods, Lagrange and Fenchel Duality, Second-Order Methods, and Quasi-Newton Methods. We will ground all these algorithms with applications and loss functions in machine learning (starting from logistic regression, linear regression, svms, matrix factorization right up to deep neural networks). The second part of this course will cover the fundamentals of discrete optimization. We will start with basic forms of combinatorial optimization (knapsack, s-t cuts/paths, matchings and matroids) and then discuss submodular functions and their applications. We will cover algorithms for submodular minimization and maximization under various combinatorial constraints. We will also discuss discrete probabilistic models such as Determinantal Point Processes (DPPs). We will ground all these algorithmic and theoretical results with real world applications in feature selection, summarization and diversified search, structured prediction, data subset selection and model compression.

Course Instructor: Rishabh Iyer

TA: Omeed Ashtiani

Course Syllabus (What will be covered)

Continuous Optimization

Lecture 1: Introduction and Course Overview
Lecture 2: Basics of Continuous Optimization
Lecture 3 & 4: Convexity and Convex Optimization
Lecture 5: Convexity and Convex Optimization
Continued Lecture 6 & 7: Gradient Descent and Family
Lecture 8: Accelerated Gradient Descent and Practical Aspects of Gradient Descent
Lecture 9: Proximal and Projected Gradient Descent
Lecture 10: Project Proposals by Students (No Lecture)
Lecture 11 and 12: Newton, Quasi Newton and Second Order Methods
Lecture 13: Conditional Gradient Descent
Lecture Extra: Practical Implementational Aspects of Gradient Descent and Family (For Assignment 2)
Lecture 14: Coordinate Gradient Descent Lecture 15: Stochastic Gradient Descent and Family
Lecture 16: SGD and Adaptive Gradient Methods
Lecture 17: Non Convex Optimization
Lecture 18: Duality

Discrete Optimization

Lecture 19 & 20: Introduction to Submodular Optimization: Definitions and Examples
Lecture 21 & 22: Convex and Concave Aspects of Submodular Functions, Submodular Optimization: Minimization and Maximization, Applications

Evaluation

Assignments (50%)
Project (50%)

Assignments

Assignments will comprise of a mix of practical and theoretical. Theoretical problems will be simple and of the form of proving certain functions are convex/non-convex, computing gradients etc. Practical problems will comprise of implementing optimization algorithms and loss functions in python. Note that the practical assignments will follow a sequence (i.e. future assignments will depend on implementations from the past,

Project

This course also will have a mini-project. The mini project could involve practical component, i.e. picking a certain real world problem, modeling the problem and optimizing the loss function with different algorithms and drawing insights on its performance. Alternatively, the mini-project could also be a survey on a particular class of optimization algorithms and theoretical results (for example convergence of SGD or GD style algorithms for certain non-convex problems).

Lecture Slides and Assignments

Lecture Slides and Assignments will be posted on the GitHub page: https://github.com/rishabhk108/OptimizationML

Links to related courses

Courses on Continuous Optimization

CMU 10-725
Berkeley EE-227C
EPFL: Optimization in ML (Prof. Martin Jaggi)
Basic and Advanced Machine Learning at UBC (Prof. Mark Schmidt)

Courses on Combinatorial/Discrete Optimization

Material (Textbooks)

Dive into Deep Learning, by Zhang, Lipton, Li and Smola
Convex Optimization: Algorithms and Complexity, by Sébastien Bubeck
Convex Optimization, Stephen Boyd and Lieven Vandenberghe
Introductory Lectures on Convex Optimization, Yurii Nesterov
A Course in Combinatorial Optimization, Alexander Schrijver
Learning with Submodular Functions: A Convex Optimization Perspective, Francis Bach
Schrijver, Alexander, Combinatorial optimization: polyhedra and efficiency, Vol. 24. Springer Science & Business Media, 2003.
Fujishige, Satoru. Submodular functions and optimization. Vol. 58. Elsevier, 2005.