Invited Speakers

Ryan Adams
 Francis Bach
Ecole Normale Superieure
 Nando de Freitas
Google Deepmind
 Surya Ganguli

 Stefanie Jegelka

 Jean Lasserre
 Suvrit Sra


Ryan Adams (Harvard)
Leveraging Structure in Bayesian Optimization

Bayesian optimization is an approach to non-convex optimization that leverages a probabilistic model to make decisions about candidate points to evaluate.  The primary advantage of this approach is the ability to incorporate prior knowledge about the objective function in an explicit way.  While such prior information has typically been information about the smoothness of the function, many machine learning problems have additional structure that can be leveraged.  I will talk about how such prior information can be found across tasks, within inner-loop optimizations, and in constraints.

Francis Bach (Ecole Normale Superieure)
Submodular Functions: from Discrete to Continuous Domains [Slides]

Submodular set-functions have many applications in combinatorial optimization, as they can be minimized and approximately maximized in polynomial time. A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization. Submodularity goes beyond set-functions and has naturally been considered for problems with multiple labels or for functions defined on continuous domains, where it corresponds essentially to cross second-derivatives being nonpositive. In this paper, we show that most results relating submodularity and convexity for set-functions can be extended to all submodular functions. In particular, (a) we naturally define a continuous extension in a set of probability measures, (b) show that the extension is convex if and only if the original function is submodular, (c) prove that the problem of minimizing a submodular function is equivalent to a typically non-smooth convex optimization problem, and (d) propose another convex optimization problem with better computational properties (e.g., a smooth dual problem). Most of these extensions from the set-function situation are obtained by drawing links with the theory of multi-marginal optimal transport, which provides also a new interpretation of existing results for set-functions. We then provide practical algorithms to minimize generic submodular functions on discrete domains, with associated convergence rates. (available at

Nando de Freitas (Google DeepMind)
Learning to Optimize

The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this talk I describe how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. The learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure.

Surya Ganguli (Stanford)
Non-convexity in the error landscape and the expressive capacity of deep neural networks

A variety of recent work has studied saddle points in the error landscape of deep neural networks.  A clearer understanding of these saddle points is likely to arise from an understanding of the geometry of deep functions.  In particular, what do the generic functions computed by a deep network “look like?” How can we quantify and understand their geometry, and what implications does this geometry have for reducing generalization error as well as training error?   We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of generic deep functions. Our results reveal an order-to-chaos expressivity phase transition, with networks in the chaotic phase computing nonlinear functions whose global curvature grows exponentially with depth but not width.  Moreover, we formalize and quantitatively demonstrate the long conjectured idea that deep networks can disentangle highly curved manifolds in input space into flat manifolds in hidden space. Our theoretical analysis of the expressive power of deep networks broadly applies to arbitrary nonlinearities, and provides intuition for why initializations at the edge of chaos can lead to both better optimization as well as superior generalization capabilities.

Stefanie Jegelka (MIT)
Submodular Optimization and Nonconvexity [Slides]

Despite analogies of submodularity and convexity, submodular optimization is closely connected with certain "nice" non-convex optimization problems for which theoretical guarantees are still possible. In this talk, I will review some of these connections and make them specific at the example of a challenging robust influence maximization problem, for which we obtain new, tractable formulations and algorithms.

Jean Lasserre (LAAS-CNRS)
The moment-LP and moment-SOS approaches in optimization and some related applications [Slides]

In a first part we provide an introduction to the basics of the moment-LP and moment-SOS approaches to global polynomial optimization. In particular, we describe the hierarchy of LP and semidefinite programs to approximate the optimal value of such problems. In fact, the same methodology also applies to solve (or approximate) Generalized Moment Problems (GMP) whose data are described by basic semi-algebraic sets and polynomials (or even semi-algebraic functions). Indeed, Polynomial optimization is a particular (and even the simplest) instance of the GMP.

In a second part, we describe how to use this methodology for solving some problems (outside optimization) viewed as particular instances of the GMP. This includes:
- Approximating compact basic semi-algebraic sets defined by quantifiers.
- Computing convex polynomials underestimators of polynomials on a box.
- Bounds on measures satisfying some moment conditions.
- Approximating the volume of compact basic semi-algebraic sets.
- Approximating the Gaussian measure of non-compact basic semi-algebraic sets.
- Approximating the Lebesgue decomposition of a measure μ w.r.t. another measure ν, based only on the moments of μ and ν.

Suvrit Sra (MIT)
Taming non-convexity via geometry

In this talk, I will highlight some aspects of geometry and its role in optimization.  In particular, I will talk about optimization problems whose parameters are constrained to lie on a manifold or in a specific metric space. These geometric constraints often make the problems numerically challenging, but they can also unravel properties that ensure tractable attainment of global optimality for certain otherwise non-convex problems.

We'll make our foray into geometric optimization via geodesic convexity, a concept that generalizes the usual notion of convexity to nonlinear metric spaces such as Riemannian manifolds.  I will outline some of our results that contribute to g-convex analysis as well as to the theory of first-order g-convex optimization.  I will mention several very interesting optimization problems where g-convexity proves remarkably useful. In closing, I will mention extensions to large-scale non-convex geometric optimization as well as key open problems.