Workshop on Nonsmooth Optimization and Applications

 (NOPTA 2024)

in Honor of the 75th Birthday of Boris Mordukhovich

April 8-12, 2024- University of Antwerp, Belgium

Confirmed speakers:

Boris Mordukhovich (Wayne State University, USA)

Title: Fundamental convergence analysis of sharpness-aware minimization

Abstract:  In this talk, we provide the fundamental convergence properties of Sharpness-Aware Minimization (SAM), a recently proposed gradient-based optimization method [1] that significantly improves the generalization of deep neural networks. The convergence properties including the stationarity of accumulation points, the convergence of the sequence of gradients to the origin, the sequence of function values to the optimal value, and the sequence of iterates to the optimal solution are established for the method. The universality of the provided convergence analysis based on inexact gradient descent frameworks [2] allows its extensions to the normalized versions of SAM such as VaSSO, and to the unnormalized versions of SAM such as USAM. Numerical experiments are conducted on classification tasks using deep learning models to confirm the practical aspects of our analysis.

The talk is based on joint research with P. D. Khanh, H.-C. Luong and D. B. Tran.

[1] Foret, P., Kleiner, A., Mobahi, H., and Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. Proceedings of International Conference on Learning Representations, 2021.

[2] Khanh, P. D., Mordukhovich, B. S., and Tran, D. B.: Inexact reduced gradient methods in smooth nonconvex optimization. Journal of Optimization Theory and Applications, doi.org/10.1007/s10957-023-02319-9, 2023.

Yurii Nesterov (Université catholique de Louvain, Belgium)

Title: Optimization, the philosophical background of artificial intelligence

Abstract:  We discuss new challenges in the modern Science, created by Artificial Intelligence (AI). Indeed, AI requires a system of new sciences, mainly based on computational models. Its development has already started by the progress in Computational Mathematics. In this new reality, Optimization plays an important role, helping the other fields with finding tractable models and efficient methods, and significantly increasing their predictive power. We support our conclusions by several examples of efficient optimization schemes related to human activity.

Coralia Cartis (University of Oxford, UK)

Title: Optimization of functions with low effective dimensionality 

Abstract:  We discuss random and deterministic subspace methods for nonconvex optimization problems. We are interested in the global optimisation of functions with low effective dimensionality, that vary only along certain important directions or components. We show that the effective subspace of variation can be efficiently learned in advance of the optimization process; we contrast this with random embedding techniques that focus directly on optimization rather than learning. For local optimization, time permitting, we will also discuss efficient choices of subspaces that blend randomisation techniques with expert deterministic choices. 

Radu Ioan Boţ (University of Vienna, Austria)

Title: Fast continuous time methods for monotone equations

Abstract: In this talk we discuss continuous in time dynamics for the problem of approaching the set of zeros of a single-valued monotone and continuous operator. Such problems are motivated by minimax convex-concave and, in particular, by convex optimization problems with linear constraints. The central role is played by a second-order dynamical system that combines a vanishing damping term with the time derivative of the operator along the trajectory, which can be seen as an analogous of the Hessian-driven damping in cases where the operator originates from a potential. We demonstrate that the norm of the operator along the trajectory and the restricted gap function exhibit fast vanishing behaviour, and that the trajectory converges weakly to a solution of the monotone equation. The implicit and explicit discrete time models, enhanced with Nesterov’s momentum and correcting terms, share the asymptotic features of the continuous dynamics. In the second part of the talk, we discuss the connection between the second-order dynamical system and a Tikhonov regularized first-order dynamical system, exhibiting fast convergence rates and strong convergence of the trajectory.

Marc Teboulle (Tel-Aviv University, Israel)

Title: Lagrangian methods for nonsmooth optimization 

Abstract: Nonconvex and nonsmooth optimization models are prevalent in modern applications, yet they are very hard to solve and pose serious theoretical and computational challenges. This talk addresses some of the theoretical challenges by focusing on the design and analysis of adaptive Lagrangian based methods for a comprehensive class of nonsmooth nonconvex models with nonlinear functional composite structures. We discuss the main obstacles, highlighting the ways they can be avoided through adaptive mechanisms, and the pillars of the convergence theory. 

This talk is based on joint works with E. Cohen and N. Hallak. 

Claudia Sagastizábal (IMECC, Unicamp, Campinas, Brazil)

Title: Exploiting VU-structure in l_1-regularized minimization

Abstract: In many optimization problems nonsmoothness appears in a structured manner, for instance when the objective function has composite form. It is then possible to make a Newton-like move in certain subspace, that concentrates all the function ``smoothness'', at least locally. On its orthogonal complement, the ``sharp'' V-subspace, an intermediate iterate is defined is such a way that the overall convergence is driven by the U-step. To successfully put in place a superlinearly convergent VU-decomposition approach, there are two fundamental issues: the V-step must be sufficiently good and the U-subspace must be well identified. Borrowing proximal-point approximation metrics from bundle methods, the VU-scheme by Mifflin and Sagastiz\'abal achieves those goals by solving two quadratic programs per iteration. For the case under consideration, in a manner similar to the well-known FISTA, an inexact proximal point can be easily computed, without solving any quadratic program. The approximation provides a V-step with good asymptotic properties. After solving only one quadratic program, the U-subspace identification is guaranteed by means of certain approximate subdifferential, akin to the epsilon-subdifferential, specially taylored to the l_1-norm.




Michel Thera (Université de Limoges, France)

Title: Linear regularity and strong CHIP of  closed sets in Asplund spaces, old and new results 

Abstract: This presentation focusses on three relevant properties for a collection of finitely many closed sets with nonempty intersection in a Banach space:  linear regularity, property (G) and strong(Fréchet/limiting) CHIP and two types of strong CHIP. As well known these properties that have been previously analyzed by different authors for closed and convex sets, play an important role in  convex optimization, especially regarding constraint qualifications, error bounds and also the convex feasibility problem. For instance, in the convex setting,  linear regularity is equivalent to the simultaneous fulfillment of strong CHIP and property (G), for the corresponding normal cones. This property being no longer valid outside the convex framework, it is the object of this talk to present some new results and in particular to prove the fact that  Asplundity of the space  may be characterized through the fact that strong limiting CHIP is a necessary  condition for local linear regularity. Then, I will  consider linear regularity for some special closed sets in convex-composite optimization. In this frame   an  equivalence result on linear regularity, strong Fréchet CHIP and property (G) will be presented.  This result extends a dual  characterization of linear regularity of finitely many closed convex sets via strong CHIP and property (G) to the  non-convex case. As an application,  we present a dual approach  to the analysis of error bounds for inequality systems by giving several dual criteria for error bounds via Fr\'echet normal cones and subdifferentials.


Amir Beck (Tel-Aviv University, Israel)

Title: New results on the multi-dimensional linear discriminant analysis problem 

Abstract:  Fisher linear discriminant analysis (LDA) is a well-known technique for dimensionality reduction and classification. The method was first formulated in 1936 by Fisher. We present  three different formulations of the multi-dimensional problem and show why two of them are equivalent.  We then  prove a rate of convergence of the form $O(q^{k^2}))$  for solving the third model. Finally, we consider the max-min LDA problem in which the objective function seeks to maximize the minimum separation among all distances between all classes. We show how this problem can be reduced into a quadratic programming problem with nonconvex polyhedral constraints and describe an effective  branch and bound method for its solution. Joint work with Raz Sharon.

Russell Luke (Universität Göttingen, Germany)

Title: Convergence theory for randomized nonconvex optimization algorithms

Abstract: Randomization of algorithms is an established technique for dealing with large scale optimization problems. The convergence analysis, however, is limited for the most part to convex problems and almost sure convergence of the ergodic sequence or similar, and as such the conventional analytic strategies only apply to a subset of important problems.  Viewing these algorithms as Markov chains on a space of probability measures allows one to go beyond almost sure convergence, but here the state of the art for the theory is not much better, being limited to Markov operators that are contractions or, until only recently, nonexpansive.   In this talk I will survey a general, but conceptually simple, framework for the analysis of broad class of stochastic algorithms that provides sufficient conditions for local convergence in distribution, with rates, in wide variety of settings. 

François Glineur (Université catholique de Louvain, Belgium)

Title:  Performance estimation of optimization methods: a guided tour

Abstract: Identifying the worst-case behaviour of an optimization method is a question that can itself be cast as an optimization problem. This is the main idea behind performance estimation, initially proposed by Yoel Drori and Marc Teboulle in 2014. In this framework, one seeks to compute the exact convergence rate of a given black-box optimization algorithm over a given class of problem instances. In many cases, including most fixed-step first-order algorithms, this computation can be reformulated into a finite-dimensional, tractable semidefinite program using necessary and sufficient interpolation conditions to describe functions involved in the problem. Solving this program provides an exact, unimprovable convergence rate, a mathematical proof guaranteeing this rate and a problem instance where this worst-case behaviour is achieved. While this computer-assisted procedure is based on numerical computations, many of the obtained results can later be converted into standard mathematical statements with analytical rates and independently checkable proofs. The knowledge of the exact worst-case behaviour of algorithms is useful to compare efficiency across methods, tune algorithmic parameters (such as step-size) and, ultimately, design new methods with improved behaviour, either using insights gained from the performance estimation procedure or with some automated method design techniques. In this talk we will survey a few of the main achievements in the area of performance estimation and showcase some recent results. The latter include the exact convergence rates of the gradient method over the full range of constant step-sizes, both on smooth (strongly) convex and smooth nonconvex functions, the performance of block-coordinate algorithms, how to deal with methods involving linear operators and the exact convergence rate of the last iterate in subgradient methods. We will also discuss some open questions related to performance estimation.


This talk will present joint work with Nizar Bousselmi, Julien Hendrickx, Yassine Kamri, Ion Necoara, Panagiotis Patrinos, Teodor Rotaru and Moslem Zamani. 


Mikhail Solodov (IMPA – Instituto de Matem´atica Pura e Aplicada, Brazil)

Title: Descent sequences in weakly convex optimization

Abstract: We present a framework for analyzing convergence and local rates of convergence of a class of descent algorithms, assuming the objective function is weakly convex. The framework is general, in the sense that it combines the possibility of explicit iterations (based on the gradient or a subgradient at the current iterate), implicit iterations (using a subgradient at the next iteration, like in the proximal schemes), as well as iterations when the associated subgradient is specially constructed and does not correspond neither to the current nor the next point (this is the case of descent steps in bundle methods). Under the subdifferential-based error bound on the distance to critical points, linear rates of convergence are established. Our analysis applies, among other techniques, to prox-descent for decomposable functions, the proximal-gradient method for a sum of functions, redistributed bundle methods, and a class of algorithms that can be cast in the feasible descent framework for constrained optimization. 

Jérôme Bolte (University Toulouse Capitole, France)

Title: Nonsmooth differentiation of equations and algorithms  

Abstract: The recent surge in algorithmic differentiation through the massive use of TensorFlow and PyTorch "autodiff" has democratized "computerized differentiation" for a broad spectrum of applications and solvers.  Prompted by the challenges of nonsmoothness (such as thresholding, constraints, and ReLU) and the need to adjust parameters in various contexts directly via these solvers, we have devised tools for nonsmooth differentiation. We have in particular developed a nonsmooth implicit function calculus, aiming to provide robust guarantees for prevalent differentiation practices. We will discuss applications of these findings through the differentiation of algorithms and equations as equations involving maximal monotone operators. 

Panagiotis Patrinos (KU Leuven, Belgium)

Title: Forward backward envelopes under the lens of generalized convexity: Unifying framework and algorithms

Abstract: In this talk we present an abstract forward-backward splitting algorithm that can be interpreted in terms of a generalized convex-concave procedure. The algorithm unifies many popular existing methods such as the classical convex-concave procedure, Bregman and classical forward-backward splitting, proximal point algorithms with \phi-divergences, natural gradient descent and motivates entirely new algorithms such as an anisotropic generalization of forward-backward splitting. We analyze the algorithms’ sublinear and linear convergence under an abstract proximal PL-inequality. In the spirit of the classical convex-concave procedure we also study certain equivalent reformulations of the optimization problem involving generalized conjugate functions obtained via a double-min duality. This viewpoint leads to a natural generalisation of the Moreau and forward-backward envelope (called c-conjugates in the context of generalised concavity) that serve as real-valued continuous surrogates preserving (local) minima and stationary points. Leveraging the generalized implicit function theorem we study the 1st and 2nd order-differentiability properties of these envelopes. Ultimately this is used to obtain globalized (quasi-)Newton accelerated versions of the aforementioned algorithms.

More

Title:  Inertial and extrapolated block majorization minimization with application to NMF 

Abstract:  Given a nonnegative matrix X and a factorization rank r, nonnegative matrix factorization (NMF) approximates the matrix X as the product of a nonnegative matrix W with r columns and a nonnegative matrix H with r rows. NMF has become a standard linear dimensionality reduction technique in data mining and machine learning. In this talk, we introduce NMF and show how it can be used as an interpretable unsupervised data analysis tool in various applications, including hyperspectral image unmixing. Motivated by NMF, we present two general block majorization-minimization (BMM) algorithms. First, a novel inerTIal block majorizaTion minimizAtioN (TITAN) framework for non-smooth non-convex optimization problems. To the best of our knowledge, TITAN is the first framework of block-coordinate method that relies on the majorization-minimization framework while embedding inertial force to each step of the block updates. The inertial force is obtained via an extrapolation operator that subsumes heavy-ball and Nesterov-type accelerations for block proximal gradient methods as special cases. We study sub-sequential convergence as well as global convergence for the generated sequence of TITAN. We illustrate the effectiveness of TITAN on NMF problems and matrix completion. Second, we propose BMM with extrapolation (BMMe) for solving a class of multi-convex optimization problems. The extrapolation parameters of BMMe are updated using a novel adaptive update rule. By showing that BMMe can be reformulated as a block mirror descent method, with the Bregman divergence adaptively updated at each iteration, we establish subsequential convergence for BMMe. We use this method to design efficient algorithms to tackle NMF with beta-divergences. These algorithms, which are multiplicative updates with extrapolation, benefit from our novel results that offer convergence guarantees. We also empirically illustrate the significant acceleration of BMMe for beta-NMF through extensive experiments. 

This is joint work with Le Hien, Valentin Leplat and Duy Nhat Phan.  

More

Peter Richtarik (KAUST, Saudi Arabia)

Title: On the resolution of a stubborn problem in federated learning and how it relates to nonsmooth optimization 

Abstract:   I'll talk about the history of the "local training" trick, which is of key importance to communication efficiency of federated learning algorithms, and explain how new insights related to non-smooth optimization were needed to shed light on why this trick works.