Introduction to Optimization

Semester: Fall 2023

Prerequisites: Linear Algebra (MATH 257 or 415)

Instructor: Prof. R. Srikant, rsrikant@illinois.edu

TAs: Ameya Anjarlekar ameyasa2@illinois.edu; Anna Winnicki annaw5@illinois.edu

Office Hours: Ameya (4-6pm Tuesdays, 2036ECEB), Anna (4-6pm Wednesdays, 3036 ECEB)

Lectures: 11-12:20 TuTh in Room 3081 ECEB

Thanksgiving Break: Nov. 18-26

Last day of instruction: Dec. 6

Grading:

- Quizzes: Homework will be posted with solutions and there will be a quiz each Thursday based on the latest homework posted. There will not be a quiz on the first Thursday (8/24) but there will be one every Thursday after that.
- Undergrads: 100% of the grade will based on weekly Thursday quizzes (can miss up to two)
- Grad Students: 80% of the grade will be based on weekly Thursday quizzes (can miss up to two) and 20% will be based on a final project
  - Final project: read and write a report on two algorithms: Nestorov's accelerated gradient method and Mirror descent. The TAs and the instructor will not provide help on the project. You have to find appropriate sources on the Internet and write a report, providing precise proofs of convergence. You have to cite the resources used. You can use ChatGPT to find resources or learn about the material but you will be responsible for any errors produced by ChatGPT. The report has to be typewritten by you, it should not be more than 10 pages long and must be in 11 pt font or bigger. Due: Dec. 8 midnight

Topics (as time permits):

· Applications: Least squares, Least squares with regularization , SVM , neural networks, power allocation in wireless networks, Internet congestion control, Netflix recommendation problem

· Inf vs min, sup vs max, global vs local optimality, existence of optimal solutions and the Weierstrass extreme value theorem, convex sets, convex and concave functions, global and local maxima for convex functions, uniqueness for strictly convex functions, tests for convex/concave functions and strongly convex/concave functions using Hessian matrix, gradient-based characterization of convex and concave functions

· First-order necessary and second-order sufficient conditions for local minimum

· Gradient descent, descent lemma, convergence analysis for a fixed step size for functions with Lipschitz-continuous derivatives, for convex functions and for strongly convex functions, the role of the condition number and a quadratic objective example (least squares is a special case); variable step sizes, Armijo’s rule and its convergence; applications to neural networks and the back propagation algorithm

· Newton’s method and improvement in convergence rate locally; problems with implementing Newton’s method and some fixes

· Constrained optimization, projected gradient descent, and its convergence for a fixed step size

· Constrained optimization with equality and inequality constraints; Lagrange multipliers, the KKT necessary conditions and the KKT sufficient conditions; application to power control in wireless networks and the waterfilling solution; sensitivity and Lagrange multipliers

· Penalty, barrier function methods, and their convergence

· Duality: the primal and dual problems; strong and weak duality and the Slater conditions, minimax characterization, linear programming and the economic interpretation of the dual; applications of strong duality to SVM

· Subgradients, subgradient descent and its convergence; application of subgradient descent to the dual problem; decomposition and its application to Internet congestion control

· Conjugate gradient method to avoid inverting the Hessian

· Proximal gradient descent and application to sparse regression (leads to an algorithm called ISTA for the l1-regularized least squares problem)

· Augmented Lagrangian methods (aka method of multipliers