aDAPTIVE IMPORTANCE SAMPLING METHODS FOR NONCONVEX STOCHASTIC PROGRAMS

Key Words:

Adaptive Importance Sampling
Bayesian Hierarchical Models
Markov Random Fields

In this project, we studied a general class of nonconvex stochastic program that emerges from maximum a posteriori inference of the doubly intractable Bayesian models, namely some of the probability densities involved have intractable normalizers, which can easily occur when they are artificially designed to better modeling data generation. As ubiquitous as they are, these problems can be complicated as their nonconvexity and nondifferentiabilty are nested together with intractable integrals. To tackle this, we designed an algorithm that combines surrogation of nonconvexity and sampling approximation of the integrals, which is carried out by the adaptive importance sampling technique to achieve variance reduction. Through our theoretical analysis and numerical experiments, we verify that our algorithm, when compared to other methods for nonconvex nondifferentiable stochastic programs, is superior in its computing time, stability near convergence and variance in solutions.

nonconvex sparse learning problems with Z-STRUCTURE AND THEIR PARAMETRIC EXTENSIONS

Key Words:

Nonconvex Surrogates for Zero-Norm
Strong Polynomial Solvability
Nonconvex Parametric Programming
Hyperparameter Selection

In sparsity learning, there is a critical dilemma in choosing the regularizer to induce sparsity among model parameters. The ideal choice, namely L0 function, can be computationally prohibitive, whereas convex relaxation like L1 function, though faster to compute, can produce undesirable solutions. To bridge this gap, in this project we studied the nonconvex regularizers of folded concave type when paired up with training objectives possessing a so-called Z-property which has widespread applications in Bayesian statistics and finance. By proposing an algorithm which is guaranteed to terminate with a directional stationary solution in linear steps, we identified an important class of sparsity learning problems that are strongly polynomially solvable.

An extension of sparsity learning is to trace a path of solutions as a function of hyperparameter that connects the training objective and regularizer. In this way, the model learned under an ideal hyperparameter can be picked out given an alternative criteria. However, this task is also haunted by the dilemma of choosing the proper regularizer. In response to this, we investigated in the analytical properties and computational methods for the solution paths traced from three regularizers, namely the ideal L0, the convex relaxation L1, and nonconvex capped L1 as an in-between candidate. Through our numerical studies, we demonstrated the advantage of nonconvex regularizers in balancing superior statistical performance and computational effort which makes it an appealing option for hyperparameter selection.

Publications from this project:

Linear-Step Solvability of Some Folded Concave and Singly-Parametric Sparse Optimization Problems, Andres Gomez, Ziyu He and Jong-Shi Pang, Mathematical Programming, 2022.
Comparing Solution Paths of Sparse Quadratic Minimization with a Stietjes Matrix, Ziyu He, Shaoning Han, Andres Gomez, Ying Cui and Jong-Shi Pang, submitted to Mathematical Programming, 2002.

Nonconvex robust programming

Key Words:

Nonconvex Robust Optimization
Adversarial Training for Deep Learning
Robust Optimization with Equilibrium Constraints

Robust optimization (RO) targets a solution to hedge against the worst case scenario caused by the uncertainty contained in a set. The key challenge in nonconvex RO is that in this case nonconvexity will commonly be coupled with the nondifferentiability brought by our robustness inducing treatments. In this project, we established the analytical properties for the computable solutions of nonconvex RO problems via a generalized saddle-point theorem and game-theoretic interpretations. A majorization-minimization algorithm is proposed as the main computing machinery to support our numerical experiments which verify the effectiveness of RO when applied to several nonconvex problems, e.g., adversarial deep learning.

Publication from this project:

Nonconvex Robust Programming via Value-Function Optimization, Ying Cui, Ziyu He and Jong-Shi Pang, Computational Optimization and Applications, 2020.

deep learning Algorithm with guaranteed d-stationarity

Key Words:

Deep Learning
Nonconvex Nonsmooth Mulit-Composite Optimization
Majorization-Minimization, Exact Penalty, Semi-Smooth Newton

We present a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. The proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

Publication from this project:

Multi-Composite Optimization for Training Deep Neural Networks, Ying Cui, Ziyu He and Jong-Shi Pang, SIAM Journal on Optimization, 2020.