The Fourth
Bay Area Optimization Meeting will be held Saturday, May 19, 2018, at Stanford University. The meeting will bring together leaders in variational analysis, optimization, and their applications.  

Details soon. Stanford University


 9:10       Welcome

 9:20       Michael Saunders, Stanford University

10:00      Jong-Shi Pang, University of Southern California

10:40      Break

11:10      Javad Lavaei, University of California, Berkeley

11:50      Caroline Uhler, Massachusetts Institute of Technology

12:30      Lunch

 2:10       Shiqian Ma, University of California, Davis

 2:50       Walid Krichene, Google

 3:30       Break

 4:00       Stephen Boyd, Stanford University

For Titles and Abstracts see below.

There is no registration fee, but attendees are required to register before May 10 by email to Johannes Royset.

Local Organizers
Peter Glynn, Stanford (Chair)
Richard Cottle, Stanford
Yinyu Ye, Stanford

Program Committee
Johannes O. Royset, Naval Postgraduate School (Chair)
Anil Aswani, University of California, Berkeley
Richard Cottle, Stanford University
Matthias Koeppe, University of California, Davis

Titles and Abstracts

Michael Saunders, Stanford University

Algorithm NCL for Constrained Optimization 

Standard optimization solvers have difficulty if the active-constraint gradients are not independent at a solution.  For example, problems of the form min f(x) st c(x) >= 0 (m constraints and n variables) may have more than n constraints active at a solution.  Such problems arise in the modeling of tax policy (with perhaps millions of constraints and thousands of variables). Algorithm NCL solves a sequence of about 10 augmented Lagrangian subproblems with constraints c(x) + r >= 0.  The extra variables make the constraints linearly independent, and the subproblem solutions converge to the required solution as r is driven to zero.  Assuming second derivatives are available, NCL expands the use of interior methods of large-scale optimization.

Jong-Shi Pang, University of Southern California

Composite Difference-Max Programs for Modern Statistical Estimation and a Nonmonotone Majorization-Minimization Algorithm

Many modern statistical estimation problems are defined by three major components: a statistical model that postulates the dependence of an output quantity on the features of the input; a loss function measuring the error between the observed output and input data; and a regularizer to control the overfitting and/or variable selection in the model.  The model parameters are determined by empirical risk minimization, which involves the minimization of the loss function weighed by the model regularizer.   We present a sampling version of the overall expected-value optimization problem where all three component functions may be of the difference-of-convex (dc) type and illustrate them with a host of commonly used functions, including those in piecewise affine regression and in deep learning that are defined by piecewise affine activation functions.  We describe a nonmonotone majorization-minimization (MM) algorithm for solving the unified nonconvex, nondifferentiable optimization problem which is formulated as a specially structured composite dc program of the pointwise max type, and present convergence results to a directional stationary solution.  An efficient semismooth Newton method is proposed to solve the dual of the MM subproblems. Numerical results are presented to demonstrate the effectiveness of the proposed algorithms and the superiority of a non-traditional piecewise affine regression model over the traditional linear model. 


Javad Lavaei, University of California, Berkeley

Graph-theoretic Convexification of Polynomial Optimization Problems: Theory, Numerical Algorithms, and Case Studies  

The area of polynomial optimization has been actively studied for years, where the goal is to find a high-quality solution using an efficient computational method. Some of the important research problems in this area are: i) how does the underlying structure of an optimization problem affect its complexity? Ii) how does sparsity help? iii) how to find a near globally optimal solution whenever it is hard to find a global minimum? iv) how to design an efficient numerical algorithm for large-scale non-convex optimization problems? v) how to deal with problems with a mix of continuous and discrete variables? In this talk, we will develop a new mathematical framework to study the above problems. Our framework rests on recent advances in graph theory and optimization, including the notions of OS-vertex sequence and treewidth, matrix completion, conic optimization, and low-rank optimization.  Finally, we will study the design of near-linear time algorithms for sparse conic optimization, and offer case studies on fundamental mixed-integer optimization problems appearing in energy and transportation systems.


Caroline Uhler, Massachusetts Institute of Technology

Your dreams may come true with MTP2

We study probability distributions that are multivariate totally positive of order two (MTP2). Such distributions appear in various applications from ferromagnetism to Brownian motion tree models used in phylogenetics. We first describe some of the intriguing properties of such distributions with respect to conditional independence. In particular, we show that MTP2 in the Gaussian setting acts as an implicit regularizer making Gaussian graphical models applicable in the high-dimensional setting without the need of a tuning parameter. We then consider the problem of non-parametric density estimation under MTP2. We introduce bimonotone subdivisions of polytopes and show that the maximum likelihood estimator under MTP2 is a piecewise linear function that induces a bimonotone subdivision. This allows us to turn the infinite-dimensional optimization problem into a finite-dimensional problem that can be solved efficiently. These properties make MTP2 constraints interesting for various data science applications.


Shiqian Ma, University of California, Davis

A Varying-Coefficient Regularized Dual Averaging Algorithm for Regularized Stochastic Optimization

Regularized stochastic optimization arises in many application domains. Proximal stochastic (sub)gradient descent (PSGD) and regularized dual averaging (RDA) are two widely accepted approaches to solve regularized stochastic optimization. In practice, PSGD usually converges faster than RDA, while RDA can deal with sparse data more efficiently and promote the structure (e.g., sparsity) of the solution. There exist some efforts on combining the advantages of these two algorithms. One such attempt was the proximal follow-the-regularized-leader method (FTRL-Proximal), which found to be very successful in industry. In this talk, we propose an alternative named varying-coefficient regularized dual averaging (VC-RDA) algorithm that also combines the advantages of PSGD and RDA with convergence guarantees. Moreover, a novel adaptive scaling scheme is incorporated to further accelerate this algorithm. Numerical results indicate that our new method outperforms existing algorithms mentioned above.


Walid Krichene, Google


Stephen Boyd, Stanford University