Thierry Champion

Displacement convexity and Wasserstein barycenters (tutorial)

Abstract: In this talk, I should give an overview of the notions of displacement convexity and Wasserstein barycenters over the set of probabilities on a compact subset of R^d. Displacement convexity is the notion of convexity for functionals over the set of probability measures induced by the geodesics for the 2-Wasserstein distance (i.e. the distance defined through an optimal transport problem associated to the quadratic cost over R^n). This notion was introduced by McCann ('95) and then generalized by Agueh and Carlier ('11) who introduced the related notion of Wasserstein barycenters. Both notions proved to have a wide range of applications (ranging from optimization, image processing and PDEs).

All necessary relevant materials and basic facts about optimal transportation will be recalled.

François Glineur

Performance estimation of first-order methods for composite convex optimization (tutorial)

Abstract: Composite convex optimization consists in the minimization of a convex objective function equal to the sum of several convex functions with different properties, for example a smooth term and a nonsmooth term. We consider a large class of first-order methods designed to solve such composite problems, which rely on specific oracles for each term. We show that the worst-case performance of each of those methods can be computed exactly by solving a semidefinite optimization problem, which also produces an explicit problem instance for which this worst-case is attained.

The performance estimation methodology was born in the original work of Drori and Teboulle in 2014, which introduced a semidefinite relaxation to study the behaviour of first-order optimization algorithms for smooth unconstrained convex optimization. In this talk, we present a new framework that produces exact convergence bounds for fixed-step linear first-order methods applied to general composite convex optimization. These methods include classical and accelerated gradient methods (including constrained and proximal variants), conditional gradient and subgradient methods, and also allow inexact gradient computations.

In particular, our approach allows us to derive exact convergence rates for the proximal gradient method in function value, gradient residual norm and distance to the solution, backed by independently-checkable proofs. We also use numerical computations to obtain worst-case rates for several well-known methods including accelerated gradient and the conditional gradient method, improving on the best published rates. We conclude with a quick overview of a MATLAB toolbox implementing our framework, called PESTO (Performance Estimation Toolbox), available from https://github.com/AdrienTaylor/Performance-Estimation-Toolbox

Joint work with Adrien B. Taylor, INRIA Paris, and Julien M. Hendrickx, Université catholique de Louvain, Belgium

Cristóbal Guzmán

Information theory and the complexity convex optimization

Abstract: In this talk I will introduce an elementary information-theoretic method to prove lower bounds on the oracle complexity of convex optimization. With this method we can show that a wide variety of settings of nonsmooth convex optimization reduce to a common combinatorial problem, named String Guessing. Finally, I will present some open problems and perspectives on this topic.

Oliver Hinder

The complexity of finding stationary points of nonconvex functions

Abstract: Recently, there has been enormous practical success applying local minimization methods to artificial neural networks, matrix completion, phase retrieval etc. This has created interest in understanding the complexity of finding stationary points. For a long time, it was known that gradient descent achieved an $\epsilon^{-2}$ rate for finding stationary points of unconstrained non-convex functions, but better rates for first-order methods were unknown.

We show that, using an algorithm that judiciously utilizes Nesterov's AGD, it is possible to improve this rate to $\epsilon^{-7/4}$ rate for functions with Lipschitz first and second derivatives. Adding Lipschitz third derivatives improves this rate to $\epsilon^{-5/3}$. Moreover, we provide almost matching lower bounds to prove that (i) finding a stationary point is easier for convex functions and (ii) acceleration in non-convex optimization requires assumptions beyond smoothness.

Joint work with Yair Carmon, John C. Duchi and Aaron Sidford.

Juan Carlos de los Reyes

Second-order orthant-based methods for the numerical solution of sparse optimization problems

Abstract: We present a second order algorithm, based on orthantwise directions, for solving optimization problems involving the sparsity enhancing l1-norm. The main idea of our method consists in modifying the descent orthantwise directions by using second order information both of the regular term and (in weak sense) of the l1-norm. The weak second order information behind the l1-term is incorporated via a partial Huber regularization. One of the main features of our algorithm consists in a faster identification of the active set. We also prove that a reduced version of our method is equivalent to a semismooth Newton algorithm applied to the optimality condition, under a specific choice of the algorithm parameters. We present several computational experiments to show the efficiency of our approach compared to other state-of-the-art algorithms.

Joon Kwon

Unifying greedy and lazy mirror descent

Abstract: The purpose of this very informal discussion is to introduce and analyse a family of algorithms which generalizes and unifies greedy mirror descent (GMD) and lazy mirror descent (LMD) (aka dual averaging aka follow the regularized leader).

Juan Peypouquet

Descent algorithms in nonsmooth convex optimization (tutorial)

Abstract: Iterative algorithms based on descent directions are at the core of many convex optimization methods. We give a theoretical overview of standard first-order methods, underscoring their connection with continuous-time dynamical systems. Next, we discuss the numerical performance of such methods in terms of their speed of convergence to the optimal value, and point out some special cases and variants for which this process is accelerated.

Sebastian Stich

Adaptive importance sampling for convex optimization (tutorial)

Abstract: In the last years, randomized optimization algorithms, and especially coordinate descent methods attracted more and more attention in the optimization community. These methods often replace the efficient ̶ but often relative expensive ̶ iterations of the classical methods with much cheaper ̶ but less efficient ̶ updates. Consequently, such methods can be applied to problems of very big size. This application is sometimes also theoretically justified by very attractive worst-case efficiency guarantees.

In this talk, we will first provide a tutorial-like introduction to random descent methods and review their complexity. We will discuss variable metric or quasi-Newton type of methods [1] and accelerated schemes [2]. In the second part of the talk we further introduce the concept of (adaptive) importance sampling and present a scheme that computes the optimal sampling frequencies based on safe estimates of the gradient [3]. We also review some alternative sampling strategies that borrow ideas form the bandit literature. We conclude by discussing open problems in the field of adaptive sampling.

[1] Joint work with B. Gartner, C. Muller (Math. Prog. 2016)

[2] Joint work with Y. Nesterov (SIAM J. Opt. 2017)

[3] Joint work with A. Raj, M. Jaggi (NIPS 2017)