Course: Introduction to Optimization

Course Information:

When: MW 4:30pm - 5:45pm (EST)

Where: Online (due to COVID-19)

Zoom: https://purdue-edu.zoom.us/j/9874334092?pwd=MVRVcHhjQXFac0VYUHcwUzdHK2tFUT09

Instructor: Pan Li

TA: Adarsh Barik, abarik@purdue.edu

Office Hour: Wed. 3:30pm-4:30pm, https://purdue-edu.zoom.us/j/99302728950

Content:

This course is a introductory course on optimization.

Courtesy warning: The lecture this year will emphasize more on the mathematical characterization while less on programming and implementation (though still some).

Topics include (tentative):

  1. Applications: Least squares (example: stock market), Least squares with regularization (example: gene regulatory networks), SVM (example: image, text classification), neural networks (example: image classification, self-driving cars, speech recognition), power allocation in wireless networks, Internet congestion control, Netflix recommendation problem

  2. Inf vs min, sup vs max, global vs local optimality, existence of optimal solutions and the Weierstrass extreme value theorem, convex sets, convex and concave functions, global and local maxima for convex functions, uniqueness for strictly convex functions, tests for convex/concave functions and strongly convex/concave functions using Hessian matrix, gradient-based characterization of convex and concave functions

  3. First-order necessary and second-order sufficient conditions for local minimum

  4. Gradient descent, descent lemma, convergence analysis for a fixed step size for functions with Lipschitz-continuous derivatives, for convex functions and for strongly convex functions, the role of the condition number and a quadratic objective example (least squares is a special case); variable step sizes, Armijo’s rule and its convergence; applications to neural networks and the back propagation algorithm

  5. Newton’s method and improvement in convergence rate locally; problems with implementing Newton’s method and some fixes

  6. Constrained optimization, projected gradient descent, and its convergence for a fixed step size

  7. Constrained optimization with equality and inequality constraints; Lagrange multipliers, the KKT necessary conditions and the KKT sufficient conditions; application to power control in wireless networks and the waterfilling solution; sensitivity and Lagrange multipliers

  8. Penalty, barrier function methods, and their convergence

  9. Duality: the primal and dual problems; strong and weak duality and the Slater conditions, minimax characterization, linear programming and the economic interpretation of the dual; applications of strong duality to SVM

  10. Subgradients, subgradient descent and its convergence

  11. Proximal gradient descent and application to sparse regression (leads to an algorithm called ISTA for the regularized least squares problem)

  12. Augmented Lagrangian methods (aka method of multipliers)


Prerequisites

Students are expected to have the following background:

  • Knowledge of basic computer science principles, sufficient to write a reasonably non-trivial computer program (CS 15900, CS 18200, CS 25100);

  • Familiarity with the basic probability theory (MA 41600);

  • Familiarity with linear algebra (MA 26200, MA 26500);

  • Familiarity with multivariant calculus (MA 26100)

Evaluation

Homework: 4% * 9 = 36%

Programming: 12% * 2 = 24%

Scripting: 3% (bonus)

Two midterms (20% * 2 = 40%)

Materials

The most relevant course material:

Some of the course material is covered in for following books:

Lecture Notes

01/20 Lec1 Introduction and Applications of Optimization [typed-notes]

01/25 Lec2 Math Preliminaries (read matrix calculus)

01/27 Lec3 Optimality conditions [typed-notes]

02/01 Lec4 Convex Set [typed-notes]

02/03-02/08 Lec5-6: Convex Function [typed-notes1 ] [typed-notes2]

02/10 Lec7: Gradient Descent Basics and Descent Lemma [typed-notes]

02/15 Lec8: Gradient Descent Convergence I : Weak Assumption and Convex Assumption [typed-notes]

02/20 ---

02/22 Lec9: Gradient Descent Convergence II : Strong Convexity , Convergence Rate [typed-notes]

02/24 Lec10: Gradient Descent Convergence III : Adaptive Stepsize and Acceleration [typed-notes]

03/01 Lec11: Application I: Neural Networks and Backpropagation [typed-notes]

03/03 Lec12: Condition Number and Newton's Method [typed-notes]

03/08 --- Midterm I

03/10 Lec13: Projected Gradient Method [typed-notes]

03/15 Lec14: Lagrange Multiplier with Equality Constraints [typed-notes]

03/17 - 03/22 Lec15: Lagrange Multiplier with Inequality Constraints and KKT Condition [typed-notes]

03/24 Lec16: Weak and Strong Duality [typed-notes]

03/29 - 03/31 Lec17: Barrier (interior point method) and Penalty Methods

04/05 Lec18: Application II: Support Vector Machine [typed-notes]

04/07 Lec19: Augmented Lagrangain Method [typed-notes]

04/12 Lec20: Subgradient Descent Method [typed-notes]

04/14 Lec21: Frank-Wolfe Method [typed-notes]

04/19 Lec22: Smoothing and Proximal Point Method

04/21 Lec23: Proximal Gradient Method

04/26 Lec24: Guest Lecture: Learning to Optimize --- Dr. Atlas Wang

04/28 --- Midterm II


Finer written-lecture-notes due to Siyuan Xu! Thanks!

[part 1, Lec 1 - Lec 9]

[part 2, Lec 10 - Lec 13]

[part 3, Lec 14 - Lec 17 first]

[part 4, Lec 17 second - Lec 20]

[part 5, Lec 21 - Lec 23]

Homeworks

[Homework 1] Math prelim and Optimal Condition. Due: 02/06;

[Homework 2] Convex Set and Convex Function I. Due: 02/13;

[Homework 3] Convex Function II and Descent Lemma. Due: 02/20;

[Homework 4] Convergence of Gradient Descent. Due: 03/06;

[Homework 5] Newton's Method and Projected Gradient. Due: 03/20;

[Homework 6] Lagrange Multipliers and KKT. Due: 03/27;

[Homework 7] KKT II and Strong Duality. Due: 04/03

[Homework 8] Penalty Methods and Augmented Lagrangian Method. Due: 04/17

[Homework 9] Subgradient Descent and Proximal Gradient. Due: 04/28