Workshop on Nonsmooth Optimization and Applications
(NOPTA 2024)
in Honor of the 75th Birthday of Boris Mordukhovich
April 8-12, 2024- University of Antwerp, Belgium
April 8-12, 2024- University of Antwerp, Belgium
Confirmed Semi-Plenary Talks:
Abstract: We consider constrained optimization problems with convex, smooth objective and constraints. We propose a new stochastic gradient algorithm, called the Stochastic Moving Ball Approximation (SMBA) method, to solve this class of problems, where at each iteration we first take a gradient step for the objective function and then perform a projection step onto one ball approximation of a randomly chosen constraint. The computational simplicity of SMBA, which uses first-order information and considers only one constraint at a time, makes it suitable for large-scale problems with many functional constraints. We provide a convergence analysis for the SMBA algorithm using basic assumptions on the problem, that yields new convergence rates in both optimality and feasibility criteria evaluated at some average point. Our convergence proofs are new since we need to deal properly with infeasible iterates and with quadratic upper approximations of constraints that may yield empty balls.
Abstract: We propose a variation of the forward–backward splitting method for solving structured monotone inclusions. Our method integrates past iterates and two deviation vectors into the update equations. These deviation vectors bring flexibility to the algorithm and can be chosen arbitrarily as long as they together satisfy a norm condition. We present special cases where the deviation vectors, selected as predetermined linear combinations of previous iterates, always meet the norm condition. Notably, we introduce an algorithm employing a scalar parameter to interpolate between the conventional forward–backward splitting scheme and an accelerated O(1/n^2)-convergent forward–backward method that encompasses both the accelerated proximal point method and the Halpern iteration as special cases. The existing methods correspond to the two extremes of the allowed scalar parameter range. By choosing the interpolation scalar near the midpoint of the permissible range, our algorithm significantly outperforms these previously known methods when addressing a basic monotone inclusion problem stemming from minimax optimization.
Abstract: In this talk we present a new coderivative-based Newton algorithm for tackling unconstrained optimization problems whose objective function can be expressed as the difference of a continuously differentiable function with locally Lipschitzian derivatives and a locally Lipschitz and prox-regular function. In contrast to the classical difference of convex programming framework, the functions are not required to be convex. Focusing our attention in the requirement of extended positive-definite properties of the generalized Hessian of the first function, we prove the well-posedness and various convergence results of the proposed algorithm. We conclude with some numerical experiments demonstrating the performance of the proposed algorithm in nonconvex settings.
This is a joint work with Boris S. Mordukhovich and Pedro Pérez-Aros.
Abstract: The decomposition or approximation of a linear operator on a matrix space as a sum of Kronecker products plays an important role in matrix equations and low-rank modeling. The approximation problem in Frobenius norm admits a well-known solution via the singular value decomposition. However, the approximation problem in spectral norm, which is more natural for linear operators, is much more challenging. In particular, the Frobenius norm solution can be far from optimal in spectral norm. In the talk, we describe an alternating optimization method based on semidefinite programming that directly aims to minimize the error in spectral norm, and present computational experiments to illustrate its practical performance.
Abstract: We study the problem of minimizing the sum of two nonconvex functions in which one of them is nonconvex and nondifferentiable and the other is differentiable with Lipschitz continuous gradient (and possible nonconvex too). By assuming that the nondifferentiable and nonconvex function is strongly quasiconvex in the sense of Polyak 1966, we provide new necessary optimality conditions for a point to be a local minimizer of the problem and, as a consequence, we provide new information regarding the convergence of the sequence generated by the Proximal Gradient algorithm under the usual Kurdyka-Lojasewiecz property. Finally, two applications in nonconvex mathematical programming are given; new information for the particular case of DC (difference of convex) programming problems and also for strongly quasiconvex mathematical programming problems.
Abstract: Bilevel structures provide natural models for the selection between the multiple equilibria of variational problems such as, for instance, multi-agent systems that can be partially controlled. The lower-level describes the equilibria while the upper-level explicitly addresses the selection criterion through a suitable objective function. Hierarchical programs of this kind are simpler than more general bilevel structures as the lower-level problems are non-parametric with respect to the upper level variables. This talk focuses on hierarchical programs whose non-parametric lower-level is given by a monotone variational inequality. In order to tackle this kind of bilevel variational problems, suitable approximated versions of the lower-level are exploited. On the one hand, the approximation does not perturb the original [exact] program too much and allows for some additional flexibility in the choice. On the other hand, it allows relying on suitable exact penalty schemes by recovering those regularity conditions that the original problem does not satisfy. Unlike Tichonov type approaches, exact penalisation allows bounding penalty parameters and therefore prevents the numerical issues of their blow up. However, turning exact penalisation into concrete convergent algorithms requires convexity assumption that the approximation does not guarantee unless the operator of the variational inequality is affine. In this latter case convergence properties of the resulting algorithms are analysed. Finally, some heads up are given for the case of general monotone operators.
Abstract: Proximal splitting algorithms are well suited to solving large-scale convex nonsmooth optimization problems. However, the proximity operators involved can be expensive to compute. We introduce Randprox, a new generic primal-dual algorithm, in which the dual update is randomized. For instance, a proximity operator is called with some small probability only. Or the proximal update is performed for a random subset of the dual variables, instead of all of them. Our randomization technique is general and encompasses many unbiased mechanisms beyond sampling and probabilistic updates, including compression. The algorithm is variance-reduced; that is, it converges to an exact solution. We derive linear convergence results in the strongly convex setting. Our convergence rates are new even in the deterministic case. Thus, it has long been known for stochastic-gradient-type methods that randomness helps getting faster algorithms, and our work shows that randomness can be beneficial to nonsmooth optimization as well. This work has been presented at the International Conference on Learning Representations (ICLR) 2023. If the format and length of the presentation permit, I will present some variations obtained in follow-up works for communication-efficient distributed optimization.
Abstract: We present a general framework to study a class of inertial methods, including accelerated and optimized (proximal-)gradient algorithms, when applied to convex functions satisfying a Polyak-Lojasiewicz (PL) inequality. The analysis allows us to recover and improve the convergence rates obtained independently in the last few years for several particular instances. The study is based on the underlying high resolution continuous-time dynamics. This unified model also reveals quite transparently why certain algorithms, like the Optimized Gradient Method, achieve a $\sqrt{2}$ improvement in the convergence rate (in terms of its dependence on the condition number), and also why passing from the convex and PL case to the (quasi-)strongly convex one results in a factor of $2$. This new insight can be used to create new design strategies to further improve their numerical performance.
Abstract: In this talk, we are concerned with infinite horizon optimal control problems governed by time-varying linear parabolic equations. Our focus is on controls represented by linear combinations of finitely many Dirac measures in the spatial domain, where only one measure is active at any given time. This distinctive characteristic endows the problem with switching properties, thus transforming it into an infinite horizon nonsmooth nonconvex problem. To address this challenge, we employ a receding horizon control (RHC) approach. This involves approximating the infinite-horizon problem with a sequence of finite-horizon ones over overlapping time intervals. We discuss the stabilizability and suboptimality of RHC and introduce a non-monotone proximal gradient method to solve the nonsmooth nonconvex sub-problem. We also present numerical experiments to validate the effectiveness of our approach in navigating the complexities inherent in infinite horizon optimal control with time-varying dynamics and switching properties.
Abstract: In this work, we develop first-order (Hessian-free) and zeroth-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference approximations of the derivatives. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization constant and the parameters of the finite difference approximations. It makes our schemes free from the need to know the actual Lipschitz constants. Additionally, we equip our algorithms with the lazy Hessian update that reuse a previously computed Hessian approximation matrix for several iterations. Specifically, we prove the global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for our new Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, where $n$ is the dimension of the problem and $\epsilon$ is the desired accuracy for the gradient norm. These complexity bounds significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$, for the first-order and zeroth-order non-convex optimization.
Abstract: In this talk a novel algorithm for nonconvex composite minimization is considered which can be interpreted in terms of dual space nonlinear preconditioning for the classical proximal gradient method. The proposed scheme can be applied to additive composite minimization problems whose smooth part exhibits an anisotropic descent inequality relative to a reference function. It is proved that the anisotropic descent property is closed under pointwise average if the dual Bregman distance is jointly convex and, more specifically, closed under pointwise conic combinations for the sum of exponentials (SumExp). We analyze the method’s asymptotic convergence and prove its linear convergence under an anisotropic proximal gradient dominance condition. Applications are discussed including exponentially regularized LPs and logistic regression with nonsmooth regularization. In numerical experiments we show significant improvements of the proposed method over its Euclidean counterparts.
Abstract: In this talk, we will cover several results related to adaptive algorithms. In particular, we show how to make gradient descent and proximal gradient fully adaptive without increasing their iteration cost. Our approach requires even less than the classical results --- we only need local Lipschitzness of the gradient. The introduced stepsizes dynamically approximate the local curvature of the differentiable function, allowing for incremental increases over iterations. We will discuss some limitations and open problems. Time permitting, we will continue our discussion with the more challenging problem of solving saddle point problems.
Abstract: We study the composite convex optimization problems with a Quasi-Self-Concordant smooth component and a possibly nonsmooth additive part. Our problem class naturally interpolates between classic Self-Concordant functions and functions with Lipschitz continuous Hessian. Previously, the best complexity bounds for this problem class were associated with trust-region schemes and implementations of a ball-minimization oracle. In this talk, we show that for minimizing Quasi-Self-Concordant functions we can use instead the basic Newton Method with Gradient Regularization. For unconstrained minimization, it only involves a simple matrix inversion operation (solving a linear system) at each step. We prove a fast global linear rate for this algorithm, matching the complexity bound of the trust-region scheme, while our method remains especially simple to implement. Then, we introduce the Dual Newton Method, and based on it, develop the corresponding Accelerated Newton Scheme for this problem class, which further improves the complexity factor of the basic method. As a direct consequence of our results, we establish fast global linear rates of simple variants of the Newton Method applied to several practical problems, including Logistic Regression, Soft Maximum, and Matrix Scaling, without requiring additional assumptions on strong or uniform convexity for the target objective. In case of a smooth approximation of nonsmooth pointwise maximum, our new method gives much better global complexity bounds than the corresponding ones of the first-order schemes.
Abstract:
We study cardinality-constrained optimization problems (CCOP) from a topological point of view. Special focus will be on M-stationary points due to Mordukhovich. We introduce nondegenerate M-stationary points and define their M-index. We show that all M-stationary points are generically nondegenerate. In particular, the sparsity constraint is active at all local minimizers of a generic CCOP. Some relations to other stationarity concepts, such as S-stationarity, basic feasibility, CW-minimality, and normal cone stationarity are discussed in detail. By doing so, the issues of instability and degeneracy of points due to different stationarity concepts are highlighted. The concept of M-stationarity allows to adequately describe the global structure of CCOP along the lines of Morse theory. For that, we study topological changes of lower level sets while passing an M-stationary point. As novelty for CCOP, multiple cells of dimension equal to the M-index are needed to be attached. This intriguing fact is in strong contrast with other optimization problems considered before, where just one cell suffices. As a consequence, we derive a Morse relation for CCOP, which relates the numbers of local minimizers and M-stationary points of M-index equal to one. The appearance of such saddle points cannot be thus neglected from the perspective of global optimization. Due to the multiplicity phenomenon in cell-attachment, a saddle point may lead to more than two different local minimizers. We conclude that the relatively involved structure of saddle points is the source of well-known difficulty if solving CCOP to global optimality.
Abstract:
In this work, we consider a connected network of finitely many agents working cooperatively to solve a min-max problem with convex-concave structure. We propose a decentralised first-order algorithm which can be viewed as combining features of two algorithms: PG-EXTRA for decentralised minimisation problems and the forward reflected backward method for (non-distributed) min-max problems. In each iteration of our algorithm, each agent computes the gradient of the smooth component of its local objective function as well as the proximal operator of its nonsmooth component, following by a round of communication with its neighbours.
Abstract:
A key problem in mathematical imaging, signal processing and computational statistics is the minimization of non-convex objective functions that may be non-differentiable at the relative boundary of the feasible set. This paper proposes a new family of first- and second-order interior-point methods for non-convex optimization problems with linear and conic constraints, combining logarithmically homogeneous barriers with quadratic and cubic regularization respectively. Our approach is based on a potential-reduction mechanism and, under the Lipschitz continuity of the corresponding derivative with respect to the local barrier-induced norm, attains a suitably defined class of approximate first- or second-order KKT points with worst-case iteration complexity $O(\eps^{-2})$ (first-order) and $O(\eps^{-3/2})$ (second-order), respectively. Based on these findings, we develop new path-following schemes attaining the same complexity, modulo adjusting constants. These complexity bounds are known to be optimal in the unconstrained case, and our work shows that they are upper bounds in the case with complicated constraints as well. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained non-convex optimization problems.
Abstract:
We propose a general framework for analyzing the behavior at its extrema of an extended real-valued function that can lack convexity or differentiability, and for which the classical Fermat rules of optimality do not apply. To this end, we employ the recently introduced notions of sup-subdifferential and partial sup-subdifferentials. The first one is a nonempty (on the domain of the considered function) enlargement of the Moreau-Rockafellar subdifferential, satisfying most of its fundamental properties and enjoying certain calculus rules. The partial sup-subdifferentials are obtained by breaking down the sup-subdifferential into one-dimensional components through the elements of a Hamel basis and play the same role as the partial derivatives in the Fermat optimality rules.
The talk is based on joint work with Malek Abbasi (University of Isfahan) and Michel Théra (University of Limoges).
Abstract:
Despite their popularity, convergence results of splitting methods have been largely limited to the convex/monotone setting. We present convergence results for several first-order methods, including the proximal point method, forward-backward-forward splitting, and Douglas-Rachford splitting, within a nonmonotone setting. To this end, we focus on a problem class characterized by an oblique weak Minty condition, which captures non-trivial structures as we demonstrate with examples. Moreover, we introduce the concept of semimonotonicity and provide sufficient conditions for the global convergence of splitting techniques for the sum of two semimonotone operators. Illustrative examples demonstrate the wide range of problems our theory is able to cover.
Abstract:
We consider a multi-agent optimisation problem whose objective consists of an expectation-valued term, parametrized by rival decisions, and a hierarchical term. Such a framework allows for capturing a broad range of stochastic hierarchical optimization problems, Stackelberg equilibrium problems, and leader-follower games. We develop an iteratively regularized and smoothed variance-reduced modified extragradient framework for iteratively approaching hierarchical equilibria in a stochastic setting. We equip our analysis with rate statements, complexity guarantees, and almost-sure convergence results. We then extend these statements to settings where the lower-level problem is solved inexactly and provide the corresponding rate and complexity statements.
Abstract:
Two novel derivative-free methods are designed to minimize smooth functions, accommodating either global or local Lipschitz continuous gradients. These methods utilize gradient approximations based on finite differences, with adaptive finite difference intervals tailored to the magnitude of exact gradients, despite not knowing them precisely. The proposed algorithms demonstrate significant convergence results, establishing the stationarity of accumulation points in general settings and achieving global convergence with constructive rates when imposing the Kurdyka-Łojasiewicz property. We also analyze the local convergence of the algorithms to nonisolated local minimizers and provide insights into their local convergence rates under this property. Numerical experiments involving various convex, nonconvex, noiseless, and noisy functions showcase the advantages of the proposed methods over other state-of-the-art approaches in derivative-free optimization.