This workshop is the culmination of the Associate Teams project FOAM (First-Order Accelerated Methods for Machine Learning). The workshop is a two-day event with talks by some of the participants of the project, and ample time for research collaboration.
Location: INRIA Sierra Team. 48 Rue Barrault, 75013 Paris. Anita Borg room.
Schedule:
Wednesday Nov 27:
10:00-10:25 Clément Lezane
Title: Accelerated Mirror Descent for Non-Euclidean Star-convex Functions
Abstract: Acceleration for non-convex functions has been an important problem in optimisation. We revisit star-convex functions, which are strictly unimodal on all lines through a minimizer. [1] accelerate gradient descent for star-convex functions with gradients that are Lipschitz with respect to the Euclidean norm in an unconstrained domain. In our work, we introduce a new assumption about the regularity of the derivative of a general norm and we accelerate mirror descent for this class of normed spaces. We show that, under it, our algorithms show sharp convergence rates for star-convex functions with Holder continuous gradients. We also prove that our convergence rate is near optimal for p-norms.
[1] Oliver Hinder, Aaron Sidford, and Nimit Sohoni. Near-optimal methods for minimizing star-convex functions and beyond. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 1894–1938. PMLR, 09–12 Jul 2020
10:30-10:55. Juan Pablo Contreras
Title: Stochastic Halpern iteration in non-Euclidean spaces
Abstract: We analyze the overall oracle complexity of the stochastic Halpern iteration with minibatch, aiming to approximate fixed points of nonexpansive and contractive operators in a finite-dimensional space with a non-Euclidean norm. For non-expansive operators, we outline conditions for almost sure convergence of the iterates and demonstrate that, under mild assumptions such as uniformly bounded variance, the expected fixed-point residual decays at a rate of $\tilde{O}(1/n)$. When approximating a fixed point within a tolerance $\varepsilon > 0$, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, surpassing the $\tilde{O}(\varepsilon^{-6})$ rate of the stochastic Krasnosel'skii-Mann iteration. Furthermore, we establish a lower bound of $\tilde{O}(\varepsilon^{-3})$, which applies to a wide range of algorithms, including all averaged iterations like Krasnosel'skii-Mann and Halpern. For contractions, we establishes the rate $\tilde{O}(\varepsilon^{-2})$. As an application, we propose new algorithms for synchronous Q-learning with average and discounted rewards. In particular, for average reward, our method improves upon the best-known sample complexity in the literature.
11:00-11:25 Cristóbal Guzmán
Title: Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective
Abstract: We introduce new functional classes that provide fine-grained measures of nonsmoothness of functions. These classes provide further understanding of the oracle complexity of nonsmooth optimization, as well as refined complexity results from specific techniques, including randomized smoothing. Based on joint work with Jelena Diakonikolas (arXiv:2403.16317).
11:30-11:55. Juan Pablo Flores
Title: Privacy Amplification by Iteration Beyond Nonexpansiveness: Continuity Moduli and Applications to Langevin Dynamics
Abstract: The convergence analysis of iterative algorithms for sampling from high dimensional distributions requires bounds on the evolution of certain probability divergences. It has been recently observed that Privacy Amplification by Iteration (PABI) --a technique used in Differential Privacy-- offers such bounds for noisy nonexpansive iterations, which include the widely used Noisy Stochastic Gradient Descent method for smooth convex losses. In this work, we extend the Privacy Amplification by Iteration technique to iterative processes with a continuity modulus of the form $\sqrt{A\delta^2 + B}$. Using this extension, we compute mixing times for Projected Langevin iterations under convex and Lipschitz potentials. Additionally, we analyze the growth of the privacy curve for Noisy SGD when the objective function is convex, Lipschitz and with Hölder-continuous gradients.
12:00-13:30 Lunch
13:30-17:00 Working session
Thursday Nov 28:
10:00-10:25 Roberto Cominetti
Title: Fixed-point error bounds for Q-learning
Abstract: In this talk we discuss error bounds for Krasnoselskii-Mann stochastic iterations for general non-expansive maps, and we show how they yield error bounds for the last iterate in the standard RVI-Q-learning iteration.
10:30-10:55 Mario Bravo
Title: Error bounds for the Krasnoselskii-Mann iteration for contractions.
Abstract: We study fixed-point error bounds for the classical Krasnoselskii-Mann (KM) iteration in the case where the operator is a contraction. Explicit non-asymptotic bounds are derived by constructing a family of recursive bounds, employing a probabilistic interpretation, and applying some known counting arguments. We discuss the connection between our estimations, the KM iteration in the nonexpansive case, and its relationship to the standard Picard iteration for contracting maps.
11:00-11:25 Aymeric Dieuleveut
Title: Counter-Examples in First-Order Optimization: A Constructive Approach.
Abstract: While many approaches were developed for obtaining worst-case complexity bounds for first-order optimization methods in the last years, there remain theoretical gaps in cases where no such bound can be found. In such cases, it is often unclear whether no such bound exists (e.g., because the algorithm might fail to systematically converge) or simply if the current techniques do not allow finding them. In this talk, we propose an approach to automate the search for cyclic trajectories generated by first-order methods. This provides a constructive approach to show that no appropriate complexity bound exists, thereby complementing approaches providing sufficient conditions for convergence. Using this tool, we provide ranges of parameters for which the famous Polyak heavy-ball, Nesterov accelerated gradient, inexact gradient descent, and three-operator splitting algorithms fail to systematically converge, and show that it nicely complements existing tools searching for Lyapunov functions.
11:30-11:55 Baptiste Goujaud
Title: Provable non-accelerations of the heavy-ball method.
Abstract: In this work, we show that the heavy-ball (HB) method provably does not reach an accelerated convergence rate on smooth strongly convex problems. More specifically, we show that for any condition number and any choice of algorithmic parameters, either the worst-case convergence rate of HB on the class of L-smooth and μ-strongly convex quadratic functions is not accelerated (that is, slower than 1 − O(κ)), or there exists an L-smooth μ-strongly convex function and an initialization such that the method does not converge. To the best of our knowledge, this result closes a simple yet open question on one of the most used and iconic first-order optimization technique. Our approach builds on finding functions for which HB fails to converge and instead cycles over finitely many iterates. We analytically describe all parametrizations of HB that exhibit this cycling behavior on a particular cycle shape, whose choice is supported by a systematic and constructive approach to the study of cycling behaviors of first-order methods. We show the robustness of our results to perturbations of the cycle, and extend them to classes of functions that also satisfy higher-order regularity conditions.
12:00-13:30 Lunch
13:30-13:55 Alexandre d'Aspremont
Title: Naive Feature Selection: Sparsity in Naive Bayes.
Abstract: Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, l1-penalized logistic regression and LASSO, while being orders of magnitude faster. For a large data set, having more than with 1.6 million training points and about 12 million features, and with a non-optimized CPU implementation, our sparse naive Bayes model can be trained in less than 15 seconds.
14:00-14:25 Benjamin Dubois-Taine
Title: Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints.
Abstract: We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an up- per bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage approximates the optimal dual value with a large set of primal feasible solutions. In the second stage, this set is trimmed down to a primal solu- tion by computing (approximate) Carath ́eodory representations. The main com- putational requirement of our method is tractability of the Fenchel conjugates of the component functions and their (sub)gradients. When the function domains are convex, the method recovers the classical duality gap bounds obtained via Shapley-Folkman. When the function domains are nonconvex, the method also recovers classical duality gap bounds from the literature, based on a more general notion of nonconvexity.
15:00-17:00 Working session
This event is sponsored by INRIA