12/3 Julien Mairal Generic Acceleration Schemes for Gradient-Based OptimizationIn this talk, we present generic techniques to accelerate gradient-based optimization algorithms. These approaches build upon the inexact proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary subproblems, leading to faster convergence for the original problem. We introduce two variants based on Nesterov's acceleration and Quasi-Newton principles, respectively. One of the key to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. 12/3 Frédéric Meunier Considérons n joueurs ayant des préférences sur l’ensemble des parties connexes d’un gâteau, identifié avec l’intervalle [0,1]. Un théorème classique de Stromquist assure, sous des conditions assez faibles, l’existence d’un partage sans-envie du gâteau en n parts (connexes), i.e. qui soit tel qu'aucun joueur ne préfère strictement une part donnée à un autre joueur.Une généralisation récente de ce théorème montre que même si les préférences d’un des joueurs ne sont pas connues (c’est son anniversaire et on ne le consulte pas), il existe encore un partage sans-envie du gâteau : on pourra découper le gâteau, laisser choisir ce joueur, et il existera toujours une affectation sans-envie des n-1 parts restantes. Nous proposons une nouvelle méthode pour démontrer le théorème de Stromquist et sa généralisation. Cette méthode permet de prouver de nouvelles variations, dont une par exemple assure l’existence d’un découpage du gâteau en n-1 parts permettant une affectation sans-envie de ces parts, quelque soit le joueur décidant de quitter la salle. Un élément central de cette méthode est l’utilisation de nouvelles extensions du lemme de Sperner (version discrète du théorème du point fixe de Brouwer). Cet exposé s’appuiera sur des travaux récents menés en collaboration avec Florian Frick, Kelsey Houston-Edwards, Francis Su et Shira Zerbib. 12/2 Anatoli Juditsky Sur la quasi-optimalité de l'estimation linéaireEtant donné une observation indirecte bruitée $\omega = Ax + \xi $, nous considérons le problème d'estimation d’une image linéaire $Bx$ d'un signal $x$ dont on suppose d’appartenir à un ensemble donné $X$. Sous certaines hypothèses sur $X$ (satisfait, par exemple, lorsque $X$ est l'intersection de $K$ cylindres elliptiques, ou la boule unité de la norme spectrale dans l'espace des matrices) et sur la norme $ \| \cdot \| $ utilisé pour mesurer l'erreur d’estimation (satisfait, par exemple, par les $p$-normes, $ 1 \leq p \leq 2$, sur $ R^m $ et par la norme nucléaire sur l'espace des matrices), et {\em sans imposer aucune restriction sur les applications $A$ et $B$,} nous construisons une estimée {\em linéaire} de $Bx$ qui est quasi-optimale parmi toutes les estimées (linéaires ou non linéaires) en termes d’espérance de la perte $\| \cdot \| $ pire-cas sur $ X$. (travail en collaboration avec A. Nemirovski, Georgia Tech) 12/2 Anders Hansen On computational barriers in optimisation and the paradoxes of deep learningLinear and semidefinite programming (LP, SDP), regularisation with Basis Pursuit and Lasso, as well as neural networks in deep learning have seen great success in mathematics, learning, statistics and data science. The success of LP is traditionally described by the fact that it is in P for rational inputs. However, in the list of problems for the 21st century Smale calls for "[Computational] models which process approximate inputs and which permit round-off computations". Hence, we will discuss the following paradox: It is impossible to design accurate algorithms for the above problems when given inaccurate, yet well-conditioned input data with arbitrarily small inaccuracies. The paradox implies the following failure for any algorithm: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose any time T and find an input such that the runtime > T without having reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically the Breakdown-epsilon > 1/2 even for randomised algorithms when the input is bounded by one and well-conditioned. Moreover, it is impossible to design an algorithm to check whether the algorithm will fail on a given input. This is in fact strictly harder than solving the original problem. Thus, e.g. even if given an oracle for solving LP accurately, one cannot determine if a given algorithm for solving LP produces nonsense on its input. The paradox implies two surprises: (1) In mathematics, statistics and learning, one computes non-computable problems on a daily basis. (2) In order to mathematically characterise this phenomenon, there is an intricate complexity theory for, paradoxically, non-computable problems. In particular, the key to explaining the success (and failure) of many algorithms in real-world scenarios lies in this rich classification theory, where sparsity turns out to be a key ingredient. The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set. 15/1 Luigi De Pascale A variational limit in density functional theoryThe Density Functional Theory offers a model for quantum mechanic which is alternative to the Schrodinger equation. It is a Variational Model so it is based on the minimization of a certain energy functional on a functional space. The main idea is to model everything in terms of the single electron density. The choice of the energy functional to minimize need to be adapted to the type of phenomenon which one aim to describe. I will discuss how a particular choice of the energy term which describes the electron-electron interaction effects the description of the so called dissociating limit of a molecule (how the molecule dissociate when the nuclei are moved away from each other). 15/1 Enric Meinhardt-Llopis Fusion of Images Image fusion is a very general problem that consists in synthesizing an ideal image of an object from several images that are incomplete, deformed, blurry, noisy and with unknown illumination changes. We focus on two concrete applications of image fusion: (1) the virtual periscope, that allows to see through the turbulent surface of water and (2) museum photography, where we obtain a clean, complete and well-lit image of a painting from several bad and partial images of it. In most cases, the best results are achieved by using a robust combinator function: e.g. the minimizer of the p-error with p=1/2, which acts as a kind of ``soft mode''. Depending on the relative importance of noise w.r.t blur, this combinator function can be applied either in the spatial domain (to combine pixel values or their derivatives), or in the frequency domain (to combine the amplitudes of each frequency). Creating a general-purpose dual-domain combinator is the subject of ongoing research. 18/12 Marco Cuturi
18/12 Miquel Oliu Barton
20/11 Rajendra Bhatia
20/11 Benedikit Wirth (Discrete) spline interpolation on Riemannian manifoldsSpline curves represent a simple and efficient tool for data interpolation in Euclidean space. During the past decades, however, more and more applications have emerged that require interpolation in (often high-dimensional) nonlinear spaces such as Riemannian manifolds. An example is the generation of motion sequences in computer graphics, where the animated figure represents a curve in a Riemannian space of shapes. Two particularly useful spline interpolation methods derive from an optimization principle: linear splines minimize the average squared velocity and cubic splines minimize the average squared acceleration among all interpolating curves. Those variational principles and their discrete analogues can be used to define continuous and discretized spline curves on (possibly infinite-dimensional) Riemannian manifolds. However, it turns out that well-posedness of cubic splines is much more intricate on nonlinear and high-dimensional spaces and requires quite strong conditions on the underlying manifold. We will analyse and discuss linear and cubic splines as well as their discrete counterparts on Riemannian manifolds and show a few applications. 16/10 Jean-David Benamou Dynamic formulations of Optimal transportation and variational Mean Field GamesI will review several relaxations of the classical Monge Optimal Transport problem using a dynamic “time” extension and discuss the corresponding available numerical methods. They also apply to some instances of variational mean field games. 16/10 Filippo Santambrogio Metric methods for heteroclinc connections in finite and infinite dimensionThe curves, defined on the whole R and valued in a space X (typically, R^n), which minimize an action of the form \int_R(1/2|u'|^2+W(u)), where Wis a double-well potential W:X->[0,+\infty] vanishing exactly at two points a^±, and the boundary data are exactly fixed as u(±\infty)=a^±, are called heteroclinic connections in the literature. Proving existence of such curves is a classical problem in calculus of variations, with lack of compactness due to its translation-invariance (see works by P. Sternberg, N. Alikakos, G. Fusco, P. Rabinowitz...). In the talk I will first explain how to transform the problem into the search for geodesics for a weighted metric, i.e. curves solving \min \int k(u)|u'|, where k=(2W)^{1/2}... In practice, we look for a geodesic in the space X endowed with a weighted distance d_k. The difficulty is that k can vanish, and actually it does. Then, I will present the main results, obtained in collaboration with Antonin Monteil (currently at UC Louvain), concerning the existence of these geodesics with increasing level of generality. The final goal is to study the case where X itself is a functional space (typically, L^2), with a potential W involving the Dirichlet energy. This allows to study elliptic PDE problems in cylinders giving to one variable the role of time. In particular, we are able to recover via a purely variational argument (and to weaken some assumptions) some classical results by S. Alama, L. Bronsard and C. Gui, and by M. Shatzman on the problem of double connections: connecting in R^2 two different heteroclinic connections. -------- 2016/17 19/6 Kristian Bredies Accelerated and preconditioned Douglas-Rachford algorithms for the solution of variational imaging problems(joint work with Hongpeng Sun)We discuss basic, accelerated and preconditioned versions of the Douglas-Rachford (DR) splitting method for the solution of convex-concave saddle-point problems that arise in variational imaging. While the basic DR iteration admits weak and ergodic convergence with rate $O(1/k)$ for restricted primal-dual gaps, acceleration leads to convergence rates of $O(1/k^2)$ and $O(q^k)$ for some $0<q<1$ under appropriate strong convexity assumptions. Further, preconditioning allows to replace the potentially computationally expensive solution of a linear system in each iteration step in the corresponding DR iteration by fast approximate solvers without the need to control the error. The methods are applied to non-smooth and convex variational imaging problems. We discuss denoising and deconvolution with $L^2$ and $L^1$ discrepancy and total variation (TV) as well as total generalized variation (TGV) penalty. Preconditioners which are specific to these problems are presented, the results of numerical experiments are shown and the benefits of the respective accelerated and preconditioned iteration schemes are discussed. 19/6 Zheng Qu Restarting accelerated gradient methods with a rough strong convexity estimateProximal gradient method is known to converge linearly for convex composite minimization problems when the objective function satisfies a local quadratic growth condition, which greatly generalizes the strong convexity condition. Accelerated proximal gradient methods are known to converge with a worst-case complexity bound $O(1/\sqrt{\epsilon})$. If in addition the problem is strongly convex with condition number $\kappa$, an accelerated linear convergence rate (worst-case complexity bound $O(\sqrt{\kappa}\log(1/\epsilon))$) can be achieved. These classical methods have been successfully extended to randomized coordinate update to improve the efficiency for large scale problems. We first show that, by using an adaptive restart strategy, the accelerated linear convergence can also be achieved under the local quadratic growth condition. Our method can be combined with any accelerated proximal gradient method with complexity bound $O(1/\sqrt{\epsilon})$. We also propose new restarting strategies for accelerated gradient and accelerated coordinate descent methods. In particular, we show that by changing the restarting point, the restarted method has a geometric rate of convergence for any restarting frequency, and so it allows us to take profit of restarting even when we do not know the local quadratic growth condition number. 22/5 Fabio D'Andreagiovanni Multiband Robust Optimization: theory and applicationsIn this talk, I will provide an overview of my research about theory and applications of RO. Specifically, I will present Multiband Robustness (Büsing and D'Andreagiovanni 2012), a model for RO proposed to generalize and refine the classical Gamma-robustness model by Bertsimas and Sim. The main aim of the model is to provide an improved representation of arbitrary non-symmetric distributions of the uncertainty, in particular histograms, which are commonly present in real-world applications. Such refined representation grants a reduction in conservatism of robust solutions, while maintaining the accessibility and computational tractability that have been a key factor of success of Gamma-robustness. I will also provide an overview of applications of the Multiband model to real-world problems that I have considered in past and ongoing industrial research projects. 22/5 Nelson Maculan The Euclidean Steiner tree problem in n-space: Mixed-integer nonlinear optimization models This is a joint work with Marcia Fampa, Virginia Costa and Luiz-Felipe R. Ribeiro da Cruz. 24/4 András Sebő What should we buy for Traveling Salesmen, for Hypertournaments and for Reliable Networks? I would like to show the tool, the way it is applied and some ongoing research using it for improving the approximation ratio of some relevant optimization (or extremal) problems. 24/4 Leo Liberti
for each x,y in E(m) (1-t)||x-y|| <= ||f(x)-f(y)|| <= (1+t)||x-y||. We apply the Johnson-Lindenstrauss lemma to the columns of a Linear Program (LP), and show that the resulting projected LP approximates the original LP quite well. We showcase the application of this methodology to solving large (and rather dense) LPs. Joint work with Pierre-Louis Poirion and Vu Khac Ky. 27/3 Victor Chepoi Tout réseau hyperbolique a un coeur We investigate the impact the negative curvature has on the traffic congestion in large-scale networks. We prove that every Gromov hyperbolic network G admits a core, thus answering in the positive a conjecture by Jonckheere, Lou, Bonahon, and Baryshnikov, Internet Mathematics, 7 (2011) which is based on the experimental observation by Narayan and Saniee, Physical Review E, 84 (2011) that real-world networks with small hyperbolicity have a core congestion. Namely, we prove that for every subset X of n vertices of a graph G with delta-thin geodesic triangles there exists a vertex m of G such that the ball B(m,4delta) of radius 4delta centered at m intercepts at least one half of the total flow between all pairs of vertices of X, where the flow between two vertices x,y of X is carried by geodesic (or quasi-geodesic) (x,y)-paths. We show that the point m can be constructed in the following way: take a median point m* of X (a point minimizing the sum of distances to the points of X) in the injective hull of G and then a closest to m* point in G. Moreover, we prove a primal-dual result showing that, for any commodity graph R on X and any r at least 8delta, the size sigma_r(R) of the least r-multi-core (i.e., the number of balls of radius r) intercepting all pairs of R is upper bounded by the maximum number of pairwise (2r-5delta)-apart pairs of R and that an r-multi-core of size sigma_{r-5\delta}(R) can be computed in polynomial time. Our result about multi-cores is based on a general Helly-type theorem for quasiconvex sets in delta-hyperbolic graphs (this is our second main result). Namely, we show that for any finite collection Q of pairwise intersecting epsilon-quasiconvex sets of a delta-hyperbolic graph G there exists a single ball B(c,2epsilon+5delta) intersecting all sets of Q. Using the Helly theorem for quasiconvex sets and a primal-dual approach, we show algorithmically that the minimum number of balls of radius 2epsilon+5delta intersecting all sets of a family Q of epsilon-quasiconvex sets does not exceed the packing number of Q (maximum number of pairwise disjoint sets of Q). 27/3 Nicolas Le Roux Optimize what you care about (and stop using the log-loss)Machine Learning systems are increasingly complex and often comprise many
interconnected components. When working on a single of these components,
one should make sure that any modification to that component improves
the whole system. I will discuss how, when performing classification, the log-loss does
not satisfy these guarantees. I will then present a simple modification
which not only improves the classification error, but also allows the
seamless integration of external constraints as well as the possibility
for the system to deal with its own uncertainty. 27/2 Claudia d'Ambrosio The focus of this talk will be on a classical continuous, non-convex optimization problem arising in the petrolium engineering context, namely, the standard pooling problem. Three different classical formulations have been proposed in the literature and their characteristics will be discussed and analysed. Moreover, strong relaxations for the pooling problem will be presented. In particular, we studied a structured non-convex subset of some special cases to derive valid nonlinear convex inequalities that we proved to define the convex hull of the non-convex subset. Preliminary computational results on instances from the literature are reported and demonstrate the utility of the inequalities. This is a joint work with Jeff Linderoth (University of Wisconsin-Madison), James Luedtke (University of Wisconsin-Madison), Jonas Schweiger (IBM). 27/2 Joseph Salmon Gap Safe screening rules for sparsity enforcing penaltiesIn high dimensional regression context, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules, leverage the expected sparsity of the solutions by ignoring some variables in the optimization, hence leading to solver speed-ups. When the procedure is guaranteed not to discard features wrongly the rules are said to be safe. We propose a unifying framework that can cope with generalized linear models regularized with standard sparsity enforcing penalties such as l1 or l1/l2 norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but is particularly well suited to block coordinate descent for many standard learning tasks: Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc. For all such tasks and on all tested datasets, we report significant speed-ups compared to previously proposed safe rules.This is joint work with Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort and Vincent Leclere. 16/1 Emmanuel Trélat Turnpike property in optimal controlTurnpike refers to the fact that, over long time, the optimal solution of a given optimal control problem remains essentially close to a steady-state, itself being the optimal solution of an associated static optimization problem. I will review this property and give some results for problems in finite and infinite dimension, and variants concerning periodic turnpike or shape turnpike (co-authors: Martin Gugat, Can Zhang, Enrique Zuazua). Stochastic Variance Reduction Methods for Saddle-Point Problems We consider convex-concave saddle-point problems where the objective functions may be split in many components, and extend recent stochastic variance reduction methods to provide the first large-scale linearly convergent algorithms for this class of problems which is common in machine learning. While the algorithmic extension is straightforward, it comes with challenges and opportunities: (a) the convex minimization analysis does not apply and we use the notion of monotone operators to prove convergence, showing in particular that the same algorithm applies to a larger class of problems, such as variational inequalities, (b) the split does need to be done with convex-concave terms. (Joint work with P. Balamurugan). 12/12 Vladimir Gurvich Metric and ultrametric spaces of resistance 12/12 Xavier Allamigeon Log-barrier interior-point methods are not strongly polynomialWe identify a class of path-following interior-point methods which are not strongly polynomial. This class corresponds to primal-dual interior-point methods which iterates in the so-called ``wide'' neighborhood of the central path arising from the logarithmic barrier. It includes short step, long step as well as predictor-corrector types of interior-point methods. We establish a lower bound on the number of iterations of these methods when applied to some parametric families of linear programs. The lower bound is expressed in terms of the number of tropical segments in a piecewise linear curve arising from the tropicalization of the central path. In this way, we derive that the aforementioned interior point methods require $\Omega(2^d)$ iterations to solve a linear program in dimension $O(d)$ which we studied in a previous work. |

SPO >