aDAPTIVE IMPORTANCE SAMPLING METHODS FOR NONCONVEX STOCHASTIC PROGRAMS

Key Words:

  • Adaptive Importance Sampling

  • Bayesian Hierarchical Models

  • Markov Random Fields

In this project, we studied a general class of nonconvex stochastic program that emerges from maximum a posteriori inference of the doubly intractable Bayesian models, namely some of the probability densities involved have intractable normalizers, which can easily occur when they are artificially designed to better modeling data generation. As ubiquitous as they are, these problems can be complicated as their nonconvexity and nondifferentiabilty are nested together with intractable integrals. To tackle this, we designed an algorithm that combines surrogation of nonconvexity and sampling approximation of the integrals, which is carried out by the adaptive importance sampling technique to achieve variance reduction. Through our theoretical analysis and numerical experiments, we verify that our algorithm, when compared to other methods for nonconvex nondifferentiable stochastic programs, is superior in its computing time, stability near convergence and variance in solutions.

nonconvex sparse learning problems with Z-STRUCTURE AND THEIR PARAMETRIC EXTENSIONS

Key Words:

  • Nonconvex Surrogates for Zero-Norm

  • Strong Polynomial Solvability

  • Nonconvex Parametric Programming

  • Hyperparameter Selection

In sparsity learning, there is a critical dilemma in choosing the regularizer to induce sparsity among model parameters. The ideal choice, namely L0 function, can be computationally prohibitive, whereas convex relaxation like L1 function, though faster to compute, can produce undesirable solutions. To bridge this gap, in this project we studied the nonconvex regularizers of folded concave type when paired up with training objectives possessing a so-called Z-property which has widespread applications in Bayesian statistics and finance. By proposing an algorithm which is guaranteed to terminate with a directional stationary solution in linear steps, we identified an important class of sparsity learning problems that are strongly polynomially solvable.

An extension of sparsity learning is to trace a path of solutions as a function of hyperparameter that connects the training objective and regularizer. In this way, the model learned under an ideal hyperparameter can be picked out given an alternative criteria. However, this task is also haunted by the dilemma of choosing the proper regularizer. In response to this, we investigated in the analytical properties and computational methods for the solution paths traced from three regularizers, namely the ideal L0, the convex relaxation L1, and nonconvex capped L1 as an in-between candidate. Through our numerical studies, we demonstrated the advantage of nonconvex regularizers in balancing superior statistical performance and computational effort which makes it an appealing option for hyperparameter selection.

Publications from this project:

Nonconvex robust programming

Key Words:

  • Nonconvex Robust Optimization

  • Adversarial Training for Deep Learning

  • Robust Optimization with Equilibrium Constraints

Robust optimization (RO) targets a solution to hedge against the worst case scenario caused by the uncertainty contained in a set. The key challenge in nonconvex RO is that in this case nonconvexity will commonly be coupled with the nondifferentiability brought by our robustness inducing treatments. In this project, we established the analytical properties for the computable solutions of nonconvex RO problems via a generalized saddle-point theorem and game-theoretic interpretations. A majorization-minimization algorithm is proposed as the main computing machinery to support our numerical experiments which verify the effectiveness of RO when applied to several nonconvex problems, e.g., adversarial deep learning.

Publication from this project:

deep learning Algorithm with guaranteed d-stationarity

Key Words:

  • Deep Learning

  • Nonconvex Nonsmooth Mulit-Composite Optimization

  • Majorization-Minimization, Exact Penalty, Semi-Smooth Newton

We present a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. The proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

Publication from this project: