About convex optimization
This is a note about various non-convex optimization algorithms, it doesn't cover approximately nothing from convex optimization.
There are a lot of ways to solve class of math optimization problems which is called convex optimization problems with various tradeoffs.
Convex Optimization is huge field. I'm not the scientist from this field, but I very like this area. Methods to solve:
- Interior point method
- Projected Subgradient method and family of subgradient methods
- Accelerated Subgradient methods
- Localization methods based on cutting plane. Originally the method was designed for find point in covnex set. But minimize covnex objective with and without constraints can be casted into this form to.
- Primal decomposition problem based methods and following solving master problem
- Dual decopomosition
- Early stop methods to solve extremely huge problems
Depend on your background, but if you have never heard about convex-optimization then I would like to inspire you with it:
1. There are no big problems with non-differentiability in this area
2. The methods are very effective
3. There is a big deep interplay of convex set with convex functions.
4. On the result of convex analysis is that convex function have to be continious in it's relative interior.
So problems with continuity can be only on border of the function domain.
5. From "4" you can think that convex optimization can not deal with discounity, but it's not complete true.
There are couple usefull generalization of convex functions:
Quasiconvex - is function which sublevel sets are convex set. In fact this function is unimodal, it can be discontinious.
Log-concave - you function f is log-concave if log(f) is concave function. It play nicely with mathematical statistics.
6. This methods find "global" minimum value of objective and some methods provide "certificate' which proves it.
In fact you can find disributions (from probaility theory) which has maximum entropy and minium KL divergence with some specified distribution, taking into account various constraints. There is no way to solve this variations in closed-form form at all !
Details about about interior-point covered in cvxbook https://web.stanford.edu/~boyd/cvxbook/
One of the central thing in convex optimization problem that if Extended slater's conditions hold (problem is strictly feasible w.r.t. to non-affine inequality) then KKT condition is necesseary and sufficient for convex optimization problem.
Here is a screenshot from Stephen Boyd and Lieven Vandeberge book for a case when objective and constrain functions are differentiable:
To Stephen Boyd:
[1] https://web.stanford.edu/~boyd/vmls/ , chapter 18-19
[2] EE364B, lecture 11 on Suequential Convex Optimization: https://stanford.edu/class/ee364b/lectures/seq_slides.pdf
[3] EE364B, lectures on Branch and Bound: https://stanford.edu/class/ee364b/lectures/bb_slides.pdf
[4] EE364B slides about L1 heuristic
[5] EE364B slides about subgradient methods
[6] AA222 - Introduction to Multidisciplinary Design Optimization chapter about gradient free methods:
http://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=11408
http://adl.stanford.edu/aa222/Lecture_Notes_files/chapter6_gradfree.pdf
p.s. John Duchi about Mykel Kochenderfer:
"It's serious guy,Mykel Kochenderfer is the guy which decide what need to be in the air planes to not crush each other"
Subgradient methods for unconstrained problems
This note written by S.Boyd, J.Duchi, L.Vandenberge
https://stanford.edu/class/ee364b/lectures/subgradients_notes.pdf
contains various algebraic properties of subgradients, connection of subgradient with directional derivative.
Subgradient methods for unconstrained problems, speedup tehcnics. Polyak step size exploit knowledge of f*.
Polyak heavy ball method.
Several other technics for smooth the step direction.
Projected subgradient methods to solve constraint-based problem (when projection into the set is an easy operation, e.g. non-negative orthant)
Idea of Sequential Convex Optimization. [2,p.4] (Abbreviature used below is SCP)
Idea of one way to extract Affine or Convex part from differentiable function [2,p.6]
Idea of one more way to extract Affine function approximation via quasilinearisation [2,p.9]
Particle method from Estimation Theory to make affine approximation in modern way [2, p.7]
This method is used (heuristically) to find minimum of f, which f*>=0
Comment about optimality condition for objective in form of summ of square
(a.k.a. non-linear least squares problem)
Closed form solution for step in Leveberg-Margquard algorithm
Stopping criteria in Leveberg-Margquard algorithm
card(x)={0 if x=0 and 1 otherwise}
Any combinatorial problem can be encoded into convex-cardinality problem. E.g. Boolean LP [4,p.15]
Polishing technic suggested from S.Boyd
We can not handle cardinality exactly, unfortunately, but there are several other mehods.
Convex Relaxation lead to L1 norm approximation.
L1 norm can be improved by weighed L1 heuristic.[4,p.23]
Also one possible way to not take L1, but use non-convex (e.g. concave function) to approximate cardinatlity and use SCP to approximately solve [4,p.8 - p.10]
The L1 heuristic has natural generalization to matrices:
trace for p.s.d. matrix and nuclear norm to arbitarily matrix [4,p.15]
Examples of convex optimization in ML (https://youtu.be/wqy-og_7SLs?t=1136) from the presentation with Stephen Boyd: