SPO‎ > ‎

### Abstracts

 12/3 Julien MairalGeneric Acceleration Schemes for Gradient-Based OptimizationIn this talk, we present generic techniques to accelerate gradient-based optimization algorithms. These approaches build upon the inexact proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary subproblems, leading to faster convergence for the original problem. We introduce two variants based on Nesterov's acceleration and Quasi-Newton principles, respectively. One of the key to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy.12/3 Frédéric MeunierGâteau d’anniversaire, joueur qui disparaît, gâteau brûlé : quelques variations autour du partage de gâteau sans-envieConsidérons n joueurs ayant des préférences sur l’ensemble des parties connexes d’un gâteau, identifié avec l’intervalle [0,1]. Un théorème classique de Stromquist assure, sous des conditions assez faibles, l’existence d’un partage sans-envie du gâteau en n parts (connexes), i.e. qui soit tel qu'aucun joueur ne préfère strictement une part donnée à un autre joueur.Une généralisation récente de ce théorème montre que même si les préférences d’un des joueurs ne sont pas connues (c’est son anniversaire et on ne le consulte pas), il existe encore un partage sans-envie du gâteau : on pourra découper le gâteau, laisser choisir ce joueur, et il existera toujours une affectation sans-envie des n-1 parts restantes.Nous proposons une nouvelle méthode pour démontrer le théorème de Stromquist et sa généralisation. Cette méthode permet de prouver de nouvelles variations, dont une par exemple assure l’existence d’un découpage du gâteau en n-1 parts permettant une affectation sans-envie de ces parts, quelque soit le joueur décidant de quitter la salle.Un élément central de cette méthode est l’utilisation de nouvelles extensions du lemme de Sperner (version discrète du théorème du point fixe de Brouwer).Cet exposé s’appuiera sur des travaux récents menés en collaboration avec Florian Frick, Kelsey Houston-Edwards, Francis Su et Shira Zerbib.12/2 Anatoli JuditskySur la quasi-optimalité de l'estimation linéaireEtant donné une observation indirecte bruitée $\omega = Ax + \xi$, nous considérons le problème d'estimation d’une image linéaire $Bx$ d'un signal $x$ dont on suppose d’appartenir à un ensemble donné $X$. Sous certaines hypothèses sur $X$ (satisfait, par exemple, lorsque $X$ est l'intersection de $K$ cylindres elliptiques, ou la boule unité de la norme spectrale dans l'espace des matrices) et sur la norme $\| \cdot \|$ utilisé pour mesurer l'erreur d’estimation (satisfait, par exemple, par les $p$-normes, $1 \leq p \leq 2$, sur $R^m$ et par la norme nucléaire sur l'espace des matrices), et {\em sans imposer aucune restriction sur les applications $A$ et $B$,} nous construisons une estimée {\em linéaire}  de $Bx$ qui est quasi-optimale parmi toutes les estimées (linéaires ou non linéaires) en termes d’espérance de la perte $\| \cdot \|$  pire-cas sur $X$.Ces résultats forment une extension essentielle des résultats classiques (e.g., Pinsker 1980 et Donoho, Liu et MacGibbon, 1990), qui ne traitent que le cas de la norme euclidienne $\| \cdot \|$ et des matrices diagonales $A$ et $B$, et impose des hypothèses beaucoup plus restrictives sur l'ensemble $X$ de signaux.  Toutes les constructions et preuves théoriques s'appuient fortement sur les outils d'optimisation convexe.(travail en collaboration avec A. Nemirovski, Georgia Tech)12/2 Anders HansenOn computational barriers in optimisation and the paradoxes of deep learningLinear and semidefinite programming (LP, SDP), regularisation with Basis Pursuit and Lasso, as well as neural networks in deep learning have seen great success in mathematics, learning, statistics and data science. The success of LP is traditionally described by the fact that it is in P for rational inputs. However, in the list of problems for the 21st century Smale calls for "[Computational] models which process approximate inputs and which permit round-off computations". Hence, we will discuss the following paradox: It is impossible to design accurate algorithms for the above problems when given inaccurate, yet well-conditioned input data with arbitrarily small inaccuracies. The paradox implies the following failure for any algorithm: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose any time T and find an input such that the runtime > T without having reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically the Breakdown-epsilon > 1/2 even for randomised algorithms when the input is bounded by one and well-conditioned. Moreover, it is impossible to design an algorithm to check whether the algorithm will fail on a given input. This is in fact strictly harder than solving the original problem. Thus, e.g. even if given an oracle for solving LP accurately, one cannot determine if a given algorithm for solving LP produces nonsense on its input.The paradox implies two surprises: (1) In mathematics, statistics and learning, one computes non-computable problems on a daily basis. (2) In order to mathematically characterise this phenomenon, there is an intricate complexity theory for, paradoxically, non-computable problems. In particular, the key to explaining the success (and failure) of many algorithms in real-world scenarios lies in this rich classification theory, where sparsity turns out to be a key ingredient.The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set.15/1 Luigi De PascaleA variational limit in density functional theoryThe Density Functional Theory offers a model for quantum mechanic which is alternative to the Schrodinger equation. It is a Variational Model so it is based on the minimization of a certain energy functional on a functional space. The main idea is to model everything in terms of the single electron density. The choice of the energy functional to minimize need to be adapted to the type of phenomenon which one aim to describe.I will discuss how a particular choice of the energy term which describes the electron-electron interaction effects the description of the so called dissociating limit of a molecule (how the molecule dissociate when the nuclei are moved away from each other).15/1 Enric Meinhardt-LlopisFusion of Images Image fusion is a very general problem that consists in synthesizing an ideal image of an object from several images that are incomplete, deformed, blurry, noisy and with unknown illumination changes.  We focus on two concrete applications of image fusion: (1) the virtual periscope, that allows to see through the turbulent surface of water and (2) museum photography, where we obtain a clean, complete and well-lit image of a painting from several bad and partial images of it.  In most cases, the best results are achieved by using a robust combinator function: e.g. the minimizer of the p-error with p=1/2, which acts as a kind of soft mode''.  Depending on the relative importance of noise w.r.t blur, this combinator function can be applied either in the spatial domain (to combine pixel values or their derivatives), or in the frequency domain (to combine the amplitudes of each frequency).  Creating a general-purpose dual-domain combinator is the subject of ongoing research.18/12 Marco CuturiGenerative Models and Optimal TransportA recent wave of contributions in machine learning center on the concept of generative models for extremely complex data such as natural images. These approaches provide principled ways to use deep network architectures, large datasets and automatic differentiation to come up with algorithms that are able to synthesize realistic images. We will present in this talk how optimal transport is gradually establishing itself as a valuable tool to carry out this estimation procedure.18/12 Miquel Oliu BartonConstant payoff in finite stochastic gamesIn any one-shot zero-sum game, the payoff induced by a couple of optimal strategies is equal to the value of the game.  For dynamic games, a natural refinement is that the average payoff, after any fraction of the game, is equal to the value of the game. In this paper we prove that this is the case for patient players in any finite zero-sum stochastic games, as conjectured by Sorin, Venel and Vigeral 2010. The proof relies on the semi-algebraic structure of optimal strategies and on the asymptotic properties of Markov chains with rare transitions. 20/11 Rajendra BhatiaThe Bures- Wasserstein distance on positive definite matricesThere is an interesting metric on the space of positive definite matrices called the Bures distance in quantum information, and the Wasserstein metric in optimal transport. We will discuss its properties, and its connections with various subjects, with emphasis on the Frechet mean of two or more matrices with respect to this distance. It is instructive to compare this with the Cartan mean (the Riemannian mean, the Karcher mean) much studied in recent years. The emphasis will be on the matrix analysis perspective, leading to inequalities, and comparisons between different means.20/11 Benedikit Wirth(Discrete) spline interpolation on Riemannian manifoldsSpline curves represent a simple and efficient tool for data interpolation in Euclidean space. During the past decades, however, more and more applications have emerged that require interpolation in (often high-dimensional) nonlinear spaces such as Riemannian manifolds. An example is the generation of motion sequences in computer graphics, where the animated figure represents a curve in a Riemannian space of shapes. Two particularly useful spline interpolation methods derive from an optimization principle: linear splines minimize the average squared velocity and cubic splines minimize the average squared acceleration among all interpolating curves. Those variational principles and their discrete analogues can be used to define continuous and discretized spline curves on (possibly infinite-dimensional) Riemannian manifolds. However, it turns out that well-posedness of cubic splines is much more intricate on nonlinear and high-dimensional spaces and requires quite strong conditions on the underlying manifold. We will analyse and discuss linear and cubic splines as well as their discrete counterparts on Riemannian manifolds and show a few applications.16/10 Jean-David BenamouDynamic formulations of Optimal transportation and  variational Mean Field GamesI will review several relaxations of the classical Monge Optimal Transport problem using a dynamic “time” extension and discuss the corresponding available numerical methods. They also apply to some instances of variational mean field games. 16/10 Filippo SantambrogioMetric methods for heteroclinc connections in finite and infinite dimensionThe curves, defined on the whole R and valued in a space X (typically, R^n), which minimize an action of the form \int_R(1/2|u'|^2+W(u)), where Wis  a double-well potential W:X->[0,+\infty] vanishing exactly at two points a^±, and the boundary data are exactly fixed as u(±\infty)=a^±, are called heteroclinic connections in the literature. Proving existence of such curves is a classical problem in calculus of variations, with lack of compactness due to its translation-invariance (see works by P. Sternberg, N. Alikakos, G. Fusco, P. Rabinowitz...).In the talk I will first explain how to transform the problem into the search for geodesics for a weighted metric, i.e. curves solving \min \int k(u)|u'|, where k=(2W)^{1/2}... In practice, we look for a geodesic in the space X endowed with a weighted distance d_k. The difficulty is that k can vanish, and actually it does.Then, I will present the main results, obtained in collaboration with Antonin Monteil (currently at UC Louvain), concerning the existence of these geodesics with increasing level of generality. The final goal is to study the case where X itself is a functional space (typically, L^2), with a potential W involving the Dirichlet energy. This allows to study elliptic PDE problems in cylinders giving to one variable the role of time. In particular, we are able to recover via a purely variational argument (and to weaken some assumptions) some classical results by S. Alama, L. Bronsard and C. Gui, and by M. Shatzman on the problem of double connections: connecting in R^2 two different heteroclinic connections.--------2016/1719/6 Kristian BrediesAccelerated and preconditioned Douglas-Rachford algorithms for the solution of variational imaging problems(joint work with Hongpeng Sun)We discuss basic, accelerated and preconditioned versions of the Douglas-Rachford (DR) splitting method for the solution of convex-concave saddle-point problems that arise in variational imaging. While the basic DR iteration admits weak and ergodic convergence with rate $O(1/k)$ for restricted primal-dual gaps, acceleration leads to convergence rates of $O(1/k^2)$ and $O(q^k)$ for some $0E(k) such that:for each x,y in E(m) (1-t)||x-y|| <= ||f(x)-f(y)|| <= (1+t)||x-y||.We apply the Johnson-Lindenstrauss lemma to the columns of a Linear Program (LP), and show that the resulting projected LP approximates the original LP quite well. We showcase the application of this methodology to solving large (and rather dense) LPs. Joint work with Pierre-Louis Poirion and Vu Khac Ky.27/3 Victor ChepoiTout réseau hyperbolique a un coeur We investigate the impact the negative curvature has on the traffic congestion in large-scale networks. We prove that every Gromov hyperbolic network G admits a core, thus answering in the positive a conjecture by Jonckheere, Lou, Bonahon, and Baryshnikov, Internet Mathematics, 7 (2011) which is based on the experimental observation by Narayan and Saniee, Physical Review E, 84 (2011)that real-world networks with small hyperbolicity have a core congestion. Namely, we prove that for every subset X of n vertices of a graph G with delta-thin geodesic triangles there exists a vertex m of G such that the ball B(m,4delta) of radius 4delta centered at m intercepts at least one half of the total flow between all pairs of vertices of X, where the flow between two vertices x,y of X is carried by geodesic (or quasi-geodesic) (x,y)-paths. We show that the point m can be constructed in the following way: take a median point m* of X (a point minimizing the sum of distances to the points of X) in the injective hull of G and then a closest to m* point in G. Moreover, we prove a primal-dual result showing that, for any commodity graph R on X and any r at least 8delta, the size sigma_r(R) of the least r-multi-core (i.e., the number of balls of radius r) intercepting all pairs of R is upper bounded by the maximum number of pairwise (2r-5delta)-apart pairs of R and that an r-multi-core of size sigma_{r-5\delta}(R) can be computed in polynomial time. Our result about multi-cores is based on a general Helly-type theorem for quasiconvex sets in delta-hyperbolic graphs (this is our second main result). Namely, we show that for any finite collection Q of pairwise intersecting epsilon-quasiconvex sets of a delta-hyperbolic graph G there exists a single ball B(c,2epsilon+5delta) intersecting all sets of Q. Using the Helly theorem for quasiconvex sets and a primal-dual approach, we show algorithmically that the minimum number of balls of radius 2epsilon+5delta intersecting all sets of a family Q of epsilon-quasiconvex sets does not exceed the packing number of Q (maximum number of pairwise disjoint sets of Q). 27/3 Nicolas Le RouxOptimize what you care about (and stop using the log-loss)Machine Learning systems are increasingly complex and often comprise many interconnected components. When working on a single of these components, one should make sure that any modification to that component improves the whole system.I will discuss how, when performing classification, the log-loss does not satisfy these guarantees. I will then present a simple modification which not only improves the classification error, but also allows the seamless integration of external constraints as well as the possibility for the system to deal with its own uncertainty.27/2 Claudia d'AmbrosioClassical formulations and strong valid inequalities for the pooling problemThe focus of this talk will be on a classical continuous, non-convex optimization problem arising in the petrolium engineering context, namely, the standard pooling problem. Three different classical formulations have been proposed in the literature and their characteristics will be discussed and analysed. Moreover, strong relaxations for the pooling problem will be presented. In particular, we studied a structured non-convex subset of some special cases to derive valid nonlinear convex inequalities that we proved to define the convex hull of the non-convex subset. Preliminary computational results on instances from the literature are reported and demonstrate the utility of the inequalities.This is a joint work with Jeff Linderoth (University of Wisconsin-Madison), James Luedtke (University of Wisconsin-Madison), Jonas Schweiger (IBM).27/2 Joseph SalmonGap Safe screening rules for sparsity enforcing penaltiesIn high dimensional regression context, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules, leverage the expected sparsity of the solutions by ignoring some variables in the optimization, hence leading to solver speed-ups. When the procedure is guaranteed not to discard features wrongly the rules are said to be safe. We propose a unifying framework that can cope with generalized linear models regularized with standard sparsity enforcing penalties such as l1 or l1/l2 norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but is particularly well suited to block coordinate descent for many standard learning tasks: Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc. For all such tasks and on all tested datasets, we report significant speed-ups compared to previously proposed safe rules.This is joint work with Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort and Vincent Leclere.16/1 Emmanuel TrélatTurnpike property in optimal controlTurnpike refers to the fact that, over long time, the optimal solution of a given optimal control problem remains essentially close to a steady-state, itself being the optimal solution of an associated static optimization problem. I will review this property and give some results for problems in finite and infinite dimension, and variants concerning periodic turnpike or shape turnpike (co-authors: Martin Gugat, Can Zhang, Enrique Zuazua).16/1 Francis BachStochastic Variance Reduction Methods for Saddle-Point Problems We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) the split does need to be done with convex-concave terms. (Joint work with P. Balamurugan).12/12 Vladimir GurvichMetric and ultrametric spaces of resistanceVoir ici12/12 Xavier AllamigeonLog-barrier interior-point methods are not strongly polynomialWe identify a class of path-following interior-point methods which are not strongly polynomial. This class corresponds to primal-dual interior-point methods which iterates in the so-called wide'' neighborhood of the central path arising from the logarithmic barrier. It includes short step, long step as well as predictor-corrector types of interior-point methods.We establish a lower bound on the number of iterations of these methods when applied to some parametric families of linear programs. The lower bound is expressed in terms of the number of tropical segments in a piecewise linear curve arising from the tropicalization of the central path. In this way, we derive that the aforementioned interior point methods require$\Omega(2^d)$iterations to solve a linear program in dimension$O(d)\$ which we studied in a previous work.