Introduction to Optimization

Semester: Fall 2023

Prerequisites: Linear Algebra (MATH 257 or 415) 

Instructor: Prof. R. Srikant, rsrikant@illinois.edu

TAs: Ameya Anjarlekar ameyasa2@illinois.edu;  Anna Winnicki annaw5@illinois.edu  

Office Hours: Ameya (4-6pm Tuesdays, 2036ECEB), Anna (4-6pm Wednesdays, 3036 ECEB)

Lectures: 11-12:20 TuTh in Room 3081 ECEB

Thanksgiving Break: Nov. 18-26

Last day of instruction: Dec. 6

Grading:

Topics (as time permits): 

·       Applications: Least squares, Least squares with regularization , SVM , neural networks, power allocation in wireless networks, Internet congestion control, Netflix recommendation problem

·       Inf vs min, sup vs max, global vs local optimality, existence of optimal solutions and the Weierstrass extreme value theorem, convex sets, convex and concave functions, global and local maxima for convex functions, uniqueness for strictly convex functions, tests for convex/concave functions and strongly convex/concave functions using Hessian matrix, gradient-based characterization of convex and concave functions

·       First-order necessary and second-order sufficient conditions for local minimum

·       Gradient descent, descent lemma, convergence analysis for a fixed step size for functions with Lipschitz-continuous derivatives, for convex functions and for strongly convex functions, the role of the condition number and a quadratic objective example (least squares is a special case); variable step sizes, Armijo’s rule and its convergence; applications to neural networks and the back propagation algorithm

·       Newton’s method and improvement in convergence rate locally; problems with implementing Newton’s method and some fixes

·       Constrained optimization, projected gradient descent, and its convergence for a fixed step size

·       Constrained optimization with equality and inequality constraints; Lagrange multipliers, the KKT necessary conditions and the KKT sufficient conditions; application to power control in wireless networks and the waterfilling solution; sensitivity and Lagrange multipliers

·       Penalty, barrier function methods, and their convergence

·       Duality: the primal and dual problems; strong and weak duality and the Slater conditions, minimax characterization, linear programming and the economic interpretation of the dual; applications of strong duality to SVM

·       Subgradients, subgradient descent and its convergence; application of subgradient descent to the dual problem; decomposition and its application to Internet congestion control

·       Conjugate gradient method to avoid inverting the Hessian

·       Proximal gradient descent and application to sparse regression (leads to an algorithm called ISTA for the l1-regularized least squares problem)

·       Augmented Lagrangian methods (aka method of multipliers