Abstracts

3 juin 2024 : Roland Grappe : Totally equimodular matrices

Totally unimodular matrices play a fundamental role in combinatorial optimization, as they offer combinatorial min-max theorems such as the MaxFlow-MinCut theorem and Kőnig's theorem about matchings in bipartite graphs. Recently, a generalization called totally equimodular matrices has emerged within the context of box-totally dual integral polyhedra, which are the polyhedra hidden underneath "strong" min-max relations. This talk aims to explore some connections between totally unimodular and totally equimodular matrices, offering insights into promising avenues while also addressing one notable drawback.

3 juin 2024 : Christop Dürr : Randomized Binary Search under Pressure

We study a generalized binary search problem on the line and general trees. On the line (e.g., a sorted array), binary search finds a target node in $O(\log n)$ queries in the worst case, where $n$ is the number of nodes. In situations with limited budget or time, we might only be able to perform a few queries. In this case, it is impossible to guarantee that the target will be found regardless of its position. Our main result is constructing a randomized strategy that maximizes the minimum (over the target position) probability of finding the target. In scenarios where there is no apriori (stochastic) information of the target's position, this should yield the \textit{de facto} search strategy of choice. As with regular binary search, we can find and run the strategy in $O(\log n)$ time and with $O(\log n)$ random bits. Our construction is obtained by reinterpreting the problem as a two-player (\textit{seeker} and \textit{hider}) zero-sum game and exploiting a deep connection with an underlying number theoretical structure.

Furthermore, we generalize the setting to study a search game on trees. In this case, a query returns the edge's endpoint closest to the target. Again, when the number of queries is bounded by some given $k$, we quantify a \emph{the-less-queries-the-better} approach by defining a seeker's profit $p$ depending on the number of queries needed to locate the hider.

For the linear programming formulation of the corresponding zero-sum game, we show that computing the best response for the hider (i.e., the separation problem of the underlying LP) is NP-hard. On the other hand, we provide a separation oracle running in time $O(n^2 2^{2k})$. This result allows us to compute the seeker's best response in polynomial time whenever $k=O(\log n)$, where $n$ is the size of the tree.

Joint work with Agustín Caracci and José Verschae

6 mai 2024 : Sophie Huiberts : Complexity of the simplex method

The simplex method is perhaps the most important algorithm for solving linear programming problems in practice.

However, we do not understand why this algorithm is so efficient; it takes exponential time in the theoretical worst case but only linear time in practice! The talk will start with some social context and women's history. Then I will introduce some of the state-of-the-art theoretical models for explaining the simplex method's performance. Finally, we will discuss in what ways the state-of-the-art fails to be a proper explanation for anything, and what needs to happen before we can hope to achieve a proper scientific theory.

6 mai 2024 : Patrick Bernard : Orbites périodiques pour un potentiel générique

Il est bien connu que les orbites périodiques d'un système dynamique générique sont hyperboliques. La situation est un peu plus compliquée en ce qui concerne les systèmes Hamiltonien, mais les orbites périodiques sont malgré tout non-dégénérées. On décrira dans cet exposé le cas plus difficile où l'on considère un système mécanique naturel (énergies cinétique plus énergie potentielle), où l'énergie cinétique est fixée une fois pour toutes, et où le potentiel est générique. Dans ce contexte encore, les orbites périodiques sont non-dégénérées.

4 mars 2024 : Virginie Ehrlacher : Sparse approximation of classical and quantum optimal transport problems with moment constraints: application to quantum chemistry

The aim of this talk is to present new sparsity results about some relaxations of classical and quantum multi-marginal optimal transport problems using moment constraints. Thanks to the so-called Tchakhaloff's theorem, it is possible to prove the existence of optimizers to these approximate optimal transport problems either as discrete probability measures charging a low number of points in the classical case, or as low-rank trace-class operators in the quantum case. We present applications of these sparsity results to the numerical approximation of the so-called Lieb functional, which is a key quantity in Density Functional Theory for electronic structure calculations for molecules. We also prove under appropriate assumptions on the set of moment functions that the value of the approximations with moment constraints converge to the value of the exact value of the optimal transport problem as the number of moments go to infinity. We also prove some rates of convergence on the associated approximation of the ground state energy of the system of electrons. We finally present some algorithms exploiting this sparsity and illustrate their numerical behaviour on several test cases.

4 mars 2024: Yves Achdou: Jeux à champ moyen déterministes avec des contrôles sur l'accélération et des contraintes d'état

On considère des jeux à champ moyen déterministes dans lesquels les joueurs contrôlent leur accélération.

On décrira rapidement le cas où le jeu a lieu dans tout l'espace et on discutera le système d’EDPs associé. On considérera ensuite le cas où les agents sont contraints à rester dans un domaine (contraintes sur l'état). Comme rien n'empêche des phénomènes de concentration sur le bord de l’espace des états, il est difficile de donner une description du modèle par un système d'EDPs. On privilégie donc une formulation lagrangienne, dans l'esprit des travaux de Benamou-Carlier et Cardaliaguet-Meszaros-Santambrogio, Cannarsa-Capuani-Cardaliaguet. Le manque de controlabilité forte pose alors des difficultés près du bord de l’espace des états. En particulier, la fonction valeur est discontinue et localement non bornée, ce qui complique l'utilisation de théorèmes de point fixe.

On présentera des résultats d'existence d'équilibres dans la formulation lagrangienne.

Les résultats discutés ont été obtenus en collaboration avec P. Mannucci (U. Padova), C. Marchi (U. Padova) et N. Tchou (U. Rennes))

5 février 2024 : Olivier Ley : Construction of a smooth convex function whose gradient orbits spiral around its minimum and at infinity

I will explain the construction of a convex function in the plane with a unique minimum at the origin such that the following properties

hold: all non trivial gradient orbits spiral around the minimum of the function and at infinity, the function is C^k regular and even

real analytic outside the origin. Finally Lojasiewicz holds true at the origin. Such a function is a surprising counter-example

of Thom's gradient conjecture both at the origin and at infinity. This is a joint work with Aris Daniilidis (Vienna) and Mounir Haddou (Rennes).

5 février 2024 : Victor Magron: State polynomial optimization for nonlinear Bell inequalities

This talk focuses on optimization over state polynomials, i.e., polynomials in noncommuting variables and formal states of their products.
An archimedean Positivstellensatz in the spirit of Putinar and Helton-McCullough is presented leading to a hierarchy of semidefinite relaxations converging monotonically to the optimum of a state polynomial subject to state constraints. This hierarchy can be seen as a state analog of the Lasserre hierarchy for optimization of polynomials, and the Navascués-Pironio-Acín scheme for optimization of noncommutative polynomials.
The motivation behind this theory arises from the study of correlations in quantum networks. Determining the maximal quantum violation of a polynomial Bell inequality for an arbitrary network is reformulated as a state polynomial optimization problem. Several examples of quadratic Bell inequalities in the bipartite scenario are analyzed.
To reduce the size of the semidefinite programs, sparsity, sign symmetry and conditional expectation of the observables' group structure are exploited.
This is based on the joint work https://arxiv.org/abs/2301.12513 with Igor Klep, Jurij Volčič, and Jie Wang.

15 janvier 2024 : Sholom Scheichtman: The gradient’s limit of a definable family of functions is a conservative set-valued field

It is well-known that if a family of smooth functions converges to some other smooth function, then this does not necessarily implies the convergence of its gradients. In this talk we will show that if the family is definable in an o-minimal structure (for instance semialgebraic, subanalytic, or any composition of the previous with exp, log), then the gradient's limit is actually a conservative set-valued field in the sense introduced by Bolte and Pauwels. This result will be presented in a more general setting, where the functions in the original family might be merely locally Lipschitz continuous and the gradients are replaced by the Clarke subgradients (or an arbitrary conservative field).

In particular, while a conservative set-valued field of a function is not necessarily equal to its gradient, it is equal to its gradient almost everywhere, implying the convergence of the gradients on a set of full measure. Moreover, the general geometric structure of a conservative field and its links with the usual subgradients are well-understood and will be presented in the talk.

Finally, we will discuss some consequences of this result for smoothing methods and potential generalizations to the case where the functions are not locally Lipschitz continuous.

15 janvier 2024 : Laurent Pfeiffer: Multi-agent aggregative problems and their mean-field approximation

The first part of the talk will be dedicated to a class of non-convex optimization problems involving a large number of agents. These problems will involve a convex cost of some quantity referred to as the aggregate, itself defined as the sum of contributions of the agents. Such problems are in particular motivated by energy management problems involving many small consumers (for example, a fleet of electrical vehicles).

A convex relaxation of the problem, based on a mean-field approximation technique, will be introduced and investigated. A numerical method, which we called Stochastic Frank-Wolfe (SFW) algorithm, will allow to compute an approximate global solution of the original problem.

The second part of the talk will focus on a class of deterministic mean-field games with a potential structure. A numerical method, based on the SFW algorithm, will be analyzed.

Based on joint works with J.F. Bonnans (Inria and CentraleSupélec), Kang Liu (University of Erlangen-Nuremberg), Nadia Oudjane (EDF R&D), and Cheng Wan (EDF R&D).

4 décembre 2023 : Philippe Thieullen: Classification des solutions KAM faibles discrètes pour des potentiels quasi-périodiques

On considère un modèle de chaines d'atomes en interaction mutuelle de type élastique et en interaction avec un substrat constitué d'impuretés distribuées quasi-périodiquement. Une configuration correspondant à une chaine d'atomes à l'équilibre est une configuration minimisant une certaine fonctionnelle d'énergie dans un sens à préciser. Une solution KAM faible est une fonction énergie potentielle permettant de réaliser des configurations minimisantes au minimum de cette fonction. Ce problème de minimisatiom est bien connu lorsque le substrat est périodique ; il s'agit du modèle de Frenkel-Kontorova. Nous montrerons comment étendre certains des résultats de la théorie KAM faible discrète au cas d'un substrat quasi-périodique d'un type particulier comme ceux décrits par une suite de Fibonacci.

Ce travail est en collaboration avec Eduardo Garibaldi (CAMPINAS) et Samel Petite (AMIENS).

4 décembre 2023 : Flavien Léger : Gradient descent with a general cost

I will present a new class of gradient-type optimization methods that extend vanilla gradient descent, mirror descent, Riemannian gradient descent, natural gradient descent, and more. This approach constructs a surrogate for the objective function in a systematic manner, based on a chosen cost function. This surrogate is then minimized using an alternating minimization scheme.

Convergence rates can be established based on generalized notions of smoothness and convexity. Moreover local versions of these two notions can be given when the cost has “nonnegative cross-curvature”, a geometric condition used in optimal transport.

As a direct application of this framework, I will present the first global convergence rates for natural gradient descent and the standard Newton's method.

6 novembre 2023: Nelly Pustelnik: IML FISTA: A Multilevel Framework for Inexact and Inertial Forward-Backward. Application to Large Scale Image Restoration.

This paper introduces a multilevel framework for inertial and inexact proximal algorithms, referred as IML FISTA, which includes multilevel adaptations of classical algorithms like forward-backward (FB) and FISTA.
The proposed IML FISTA is supported by strong theoretical guarantees: we establish both the convergence rate and the convergence of the iterates, a critical outcome for addressing ill-posed problems. We propose a particular instance of IML FISTA, based on the use of the Moreau envelope to build efficient and useful coarse corrections, fully adapted to solve image restoration problems.
We evaluate our approach on several image reconstruction problems including hyperspectral image restoration. We show that it considerably accelerates the convergence of the corresponding one-level (i.e. FB or FISTA) version of the methods.
In the context of hyperspectral image restoration, two methods for approximating the objective function dedicated to this problem are proposed. In both cases, the associated convergence guarantees are equivalent to state-of-the-art approaches. These two methods are compared to FISTA, demonstrating the relevance of the proposed approach for very large volumes of data.

Ref : G. Lauga, E. Riccietti, N. Pustelnik, and P. Goncalves, IML FISTA: Inexact MuLtilevel FISTA for Image Restoration, hal-04075814, 2023.

6 novembre 2023: Samuel Vaiter: Garanties de convergence pour la différentiation automatique.

Dans cet exposé, je présenterai quelques résultats de convergence concernant la différentiation automatique d'algorithmes itératifs, qu'ils soient "lisses" ou non. Alors que les asymptotiques des problèmes lisses sont bien compris (Gilbert 1992, Beck 1994), principalement grâce à l'utilisation d'un théorème de point fixe, le cas non-lisse présente davantage de difficultés. Je montrerai comment le cadre des Jacobiens conservatifs permet d'obtenir de tels résultats. J'illustrerai également une stratégie à adopter lorsque l'opérateur n'est pas contractant, en prenant pour motivation la différentiation de l'algorithme de Sinkhorn-Knopp. Enfin, j'introduirai la différentiation en un coup, une méthode qui alliera la simplicité de la différentiation automatique à la performance de la différentiation implicite. Cette approche, particulièrement adéquate pour les algorithmes rapides, sera illustrée à travers des méthodes d'optimisation superlinéaires et des scénarios d'optimisation bi-niveaux. Travail en collaboration avec J. Bolte et E. Pauwels.

2 Octobre 2023: Marie Laclau: Long Information Design

We study competitive information design games between two designers with opposite preferences who want to influence the final action of a decision-maker. Each designer controls the public information on a private persistent state: in each period the designers can disclose information to the decision-maker about their own state. Using tools from zero-sum repeated games with incomplete information on both sides, we study equilibrium payoffs and strategies depending on the timing of the game and the possible deadline. Our analysis covers continuous environments as well as environments in which designers can only induce finite sets of posterior beliefs: in the latter framework, there may be no bound on the number of communication stages required at equilibrium.

This is a joint work with F. Koessler, T. Tomala and J. Renault

2 Octobre 2023: Julien Combe: Unpaired Kidney Exchange -- Overcoming Double Coincidence of Wants without Money

For an incompatible patient-donor pair, kidney exchanges often forbid receipt-before-donation (the patient receives a kidney before the donor donates) and donation-before-receipt, causing a double-coincidence-of-wants problem. We study an algorithm, the Unpairedkidney exchange algorithm, which eliminates this problem. In a dynamic matching model, we show that waiting time of patients under the Unpaired is close to optimal and substantially shorter than widely used algorithms. Using a rich administrative dataset from France, we show that Unpaired achieves a match rate of 63 percent and an average waiting time of 176 days for transplanted patients.The (infeasible) optimal algorithm is only slightly better (64 percent and 144 days); widely used algorithms deliver less than 40 percent and at least 232 days. We discuss a range of solutions that can address the potential practical incentive challenges of the Unpaired. In particular, we extendour analysis to an environment where a deceased donor wait list can be integrated to improve the performance of algorithms. We show that our theoretical and empirical comparisons continue tohold. Finally, based on these analyses, we propose a practical version of the Unpaired algorithm.

This is based on a joint work with Mohammad Akbarpour, YinHua He, Victor Hiller, Robert Shimer and Olivier Tercieux.

5 Juin 2023: Carola Doerr: Black-box Optimization Algorithms: Theory, Practice, and their Application to Generating Point Sets of Small Star Discrepancy

After a brief introduction to the fascinating world of black-box optimization algorithms and their theoretical analysis, we will embark on a journey to see how black-box algorithms can be used to obtain well-distributed point sets of small L_infty star discrepancy.

The L_infty star discrepancy of a point set P={p^1,...,p^n} \subset [0,1)^d measures the largest absolute deviation between the volume of an anchored box [0,y) and the fraction |P \cap [0,y) |/n of points that fall inside it. Point sets of low star discrepancy have numerous applications including numerical integration, financial mathematics, design of experiments, and optimization. While low-discrepancy sequences are intensively studied, the practically relevant question of low discrepancy point sets has been widely neglected. With this presentation we hope to spark interest in computational aspects of star discrepancy. We also hope to pique your interest for black-box optimization.

5 Juin 2023: Christine Fricker: Mean-field analysis for an incentive policy in large car-sharing systems

The talk explores through an example the mean-field method for the study of large stochastic networks. It deals with a load-balancing algorithm for a closed stochastic network with two zones with different demands. The algorithm is motivated by an incentive policy for redistribution of cars in a large-scale car-sharing system. The service area is divided into two zones. When cars stay too long in the low-demand zone, users are encouraged to pick them up and return them in the high-demand zone. The model is a closed network as zones are divided into cells, nodes of the network, while the cars are the customers. The mean-field limit solution of an ODE gives the large scale distribution of the station state in both clusters. An equilibrium point of this ODE is characterized via the invariant measure of a random walk in the quarter-plane. The proportion of empty and saturated stations measures how the system is balanced. Numerical experiments illustrate the impact of the incentive policy. Our study shows that the incentive policy helps when the high-demand zone observes a lack of cars but a saturation must be prevented especially when the high-demand zone is small.

Joint work with Bianca Marin Moreno, Hanene Mohamed, Martin Trépanier et Amaury Philippe.

15 Mai 2023: Claus Scheiderer: Convex hulls of monomial curves, and a sparse positivstellensatz

Let $C\subseteq{\mathbb R}^n$ be a semialgebraic curve, let $K$ be its closed convex hull. We show that $K$ admits a lifted semidefinite description by finitely many linear matrix inequalities (LMIs), each of size $k:=\left\lfloor\frac n2\right\rfloor+1$. In contrast, closed convex hulls of (semi-) algebraic surfaces need not be spectrahedral shadows. Although the proof of our result is non-constructive in general, it becomes entirely constructive when $C$ is a monomial curve, given parametrically as $(t^{m_1},\dots,t^{m_n})$ with $t$ varying in an interval. In this case, $K$ is described by $\mathcal{O}(d)$ many LMIs of size $k$, where $d=\max\{m_1,\dots,m_n\}$ is the degree of the curve. On the dual side we prove that if a univariate polynomial $p(t)$ of degree $d$ with at most $2k+1$ monomials is non-negative on ${\mathbb R}_+$, then $p$ admits a representation $p=t^0\sigma_0+\cdots+t^{d-k}\sigma_{d-k}$ where the polynomials $\sigma_i$ are sums of squares and $\deg(\sigma_i)\le2k$. The latter is a sparse positivstellensatz for univariate polynomials. In part this is joint work with Gennadiy Averkov.

15 Mai 2023: Mateusz Skomra: Signed tropicalizations of convex semialgebraic sets

Convex semialgebraic sets arise naturally in convex optimization problems such as the semidefinite programming or the hyperbolic programming. As a result, questions motivated by optimization, such as the study of the expressivity of semidefinite programming, lead to questions in real algebraic geometry (e.g., study of classes of sets that are representable by linear matrix inequalities). In this talk, we discuss the non-Archimedean convex semialgebraic sets, obtained by replacing the field of real numbers by the field of Puiseux series. Such sets can be analyzed using tools from tropical geometry. We give a full characterization of regular sets that are obtained as signed tropicalizations of convex semialgebraic sets and we prove that the signed tropicalizations of hyperbolicity cones, spectrahedra, and polyhedra have a more restrictive structure. To obtain our results, we combine two recent advances in the area of tropical geometry: the study of signed valuations of general semialgebraic sets and the separation theorems for signed tropical convexities. To finish the talk, we will discuss the link between non-Archimedean semidefinite programming and stochastic mean payoff games.

3 avril 2023: Gersende Fort: "Stochastic Variable Metric Proximal-Gradient algorithm with Variance Reduction."

Computational Machine Learning are often faced with finite-sum optimization on a subset of R^d, when the function F is the sum of a smooth term and a non-smooth convex one. Typically, the objective function is the sum of a penalized empirical loss computed over a very large number of training examples.
In this talk, we consider an iterative algorithm for the computation of a minimizer, which relies on the gradient operator of the smooth part of F and on the proximal operator of the non-smooth part. We propose a novel algorithm which adresses the following original setting.
First, the gradient is available up to a positive-definite preconditioning matrix, and possibly up to an error (deterministic or stochastic).
Second, the algorithm addresses the finite-sum structure by a stochastic subsampling.
Finally, the algorithm introduces a variance reduction scheme in order to reduce the errors inherited from the naive Monte Carlo subsampling.
Proximal-gradient based algorithms, combined with a variable metric, are a first example of this general setting. The talk will show that some stochastic Expectation-Maximization algorithms are a second family of examples. We will then introduce the novel stochastic algorithm, with an emphasis on the variance reduction technique. Finally, we will provide a non-asymptotic convergence analysis of this stochastic algorithm and will discuss how to choose some design parameters in order to satisfy the epsilon-approximate stationarity condition.

This is a joint work with E. Moulines. The talk is based on the paper arXiv:2301.00631

3 avril 2023: Bruno Gaujal: "Higher Order Reinforcement Learning"

In this talk, I will present our two forays into higher order reinforcement learning in Markov decision processes (MDPs).

1. Regret of exploration.

The first part focuses on our attempt to go further than the regret (Cezaro sum of rewards) as a measure of performance for a learning algorithm (RL) in MDPs. I will introduce a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the ``regret of exploration''.

I will also present a new episodic reinforcement learning algorithm whose episodes are based on the performance of the current policy with respect to the best policy over the current confidence set. This is in contrast with all existing RL algorithms whose episode lengths are only based on the number of visits to the states. This new algorithm has T^(1/2) regret as most RL algorithms but enjoys a unique additional property: We show that all current episodic RL algorithms have an asymptotically linear regret of exploration for all MDPs while ours has a (log T)^(1/2) asymptotic regret of exploration for generic deterministic MDPs.

2. Identification of Blackwell optimal policies.

Most RL identification algorithms aim at identifying gain optimal policies or discount optimal policies with a fixed discount factor. Here, I will present under which conditions it is possibe to go further and identify Blackwell optimal policies in deterministic MDPs. Then, I will focus on the analysis of identification algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performance. The performance of our method compares to the lower bound up to a factor S^2 in the worst case (where S is the number of states), and up to a constant in certain classes of DMDPs.

This is a joint work with Victor Boone.

6/03/2023 François-Xavier Vialard

On the existence of Monge maps for the Gromov–Wasserstein problem

For the L2-Gromov–Wasserstein distance, we study the structure of minimizers in Euclidean spaces for two different costs. The first cost is the scalar product for which we prove that it is always possible to find optimizers as Monge maps and we detail the structure of such optimal maps. The second cost is the squared Euclidean distance for which we show that the worst case scenario is the existence of a map-anti-map structure. Both results are direct and indirect consequences of an existence result on costs that are defined by submersions. In dimension one for the squared distance, we show that a monotone map is optimal in some non-symmetric situations, thereby giving insight on why such a map is often found optimal in numerical experiments. In addition, we show numerical evidence for a negative answer to the existence of a Monge map under the conditions of Brenier s theorem, suggesting that our result cannot be improved in general.

6/03/2023 Pierre-Louis Poirion

Newton method for unconstrained non-convex optimization

In this talk, we present a randomized subspace regularized Newton method for a non-convex function. We show that our method has global convergence under appropriate assumptions, and its convergence rate is the same as that of the full regularized Newton method. Furthermore, we can obtain a local linear convergence rate, under some additional assumptions, and prove that this rate is the best we can hope when using random subspace.

6/02/2023 Juan Peypouquet

We present an overview of the dynamical aspects of old and new first-order methods used in optimization and variational analysis, and how inertial features and relaxation can help improve their performance. Special attention will be paid to inertial and overrelaxed primal-dual methods, as an illustration.

6/02/2023 Sorin-Mihai Grad

A fresh look at algorithms for solving smooth multiobjective optimization problems

We propose a new approach for constructing practical algorithms for solving smooth multiobjective optimization problems based on determining decreasing directions via suitable linear programming problems. The presented iterative method is specialized for unconstrained, sign constrained and linearly constrained multiobjective optimization problems. In all cases, the objective function values sequence is decreasing with respect to the corresponding nonnegative orthant, and every accumulation point of the sequence generated by the algorithm is a substationary point to the considered multiobjective optimization problem, becoming, under convexity assumptions, a weakly Pareto efficient solution. Different to similar algorithms from the literature, the ones proposed in this work involve decreasing directions that are easily computable in polynomial time.

The talk is based on joint work with Tibor Illés and Petra Renáta Rigó (Corvinus Center for Operations Research at Corvinus Institute for Advanced Studies, Corvinus University of Budapest).

9/01/2023 Manuel Radons

Degree Theory for the Linear Complementarity Problem and equivalent Piecewise Linear Systems

Let $A$ be a $n\times n$ real matrix. The piecewise linear equation system $z-A\vert z\vert =b$ is called an absolute value equation (AVE). It is well-known to be equivalent to the linear complementarity problem. Unique solvability of the AVE is known to be characterized in terms of a generalized Perron root called the sign-real spectral radius of $A$. For mere, possibly non-unique, solvability no such characterization exists. We define the concept of the aligned spectrum of $A$ and prove, under some mild genericity assumptions on $A$, that the mapping degree of the piecewise linear function $F_A:\mathbb{R}^n\to\mathbb{R}^n\,, z\mapsto z-A\lvert z\rvert$ is congruent to $(k+1)\mod 2$, where $k$ is the number of aligned values of $A$ which are larger than $1$. We also derive an exact---but more technical---formula for the degree of $F_A$ in terms of the aligned spectrum. Finally, we derive the analogous quantities and results for the LCP.

9/01/2023 Gleb Koshevoy

Gale-Shapley algorithm revisited: semistability

We revisit the Gale-Shapley algorithm. For that we consider a broader problem of the existence of stable systems of contracts (we do not limit ourselves with assuming finiteness of sets of contracts). We show that stable sets of contracts exist if (and almost only if) choices of agents satisfy path-independence.

For the proof, we invent a notion of semi-stable pairs and show that they form a chain complete poset. Then applying the Zorn lemma to this poset we get the result.

Moreover, we generalize the Gale-Shapley algorithm and construct a dynamic process on the poset of semi-stable pairs such that stable sets are steady states of such a process. This is a joint work with Vladimir Danilov.

5/12/2022 Pierre Ablin

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large-scale setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. Numerical experiments validate the usefulness of our method. The beginning of the talk will be a tutorial on bilevel optimization and applications in machine learning.

Reference: M.Dagréou, P.Ablin, S.Vaiter and T.Moreau. "A framework for bilevel optimization that enables stochastic and global variance reduction algorithms". https://arxiv.org/abs/2201.13409

7/11/2022 Warren Hare

Approximations in Model-based Blackbox Optimization

A blackbox is a function that provides output without explanation of how the output was constructed. One common strategy for optimization involving blackbox functions is to numerically approximate gradients, sub-gradients, and/or Hessians and use these approximations in a classical method. In this talk, we discuss classical and novel approximation techniques for blackbox functions. We further illustrate the use of such techniques within blackbox optimization algorithms.

Based on joint work with Yiwen Chen, Gabriel Jarry-Bolduc, Chayne Planiden, and Kashvi Sravastava.

7/11/2022: Marc Lambert

From Kalman filtering to Langevin based estimation: the variational Gaussian approximation in continuous time.

We are interested in the propagation of a Gaussian distribution through a nonlinear stochastic differential equation (SDE). This problem has applications in Kalman filtering where the SDE models the dynamics of a system under uncertainties but also in density estimation where the SDE is a Langevin dynamics which converges in distribution to the Gibbs density. The density of the stochastic random variable follows the Fokker-Planck equation, this equation is not tractable in the general case but can be solved in closed-form if we consider its variational Gaussian approximation regularized by a Wasserstein distance. We show that the optimal flow of Gaussians is driven by two fundamental mean and covariance differential equations which are well known in the Kalman filtering community. Using Wasserstein's gradient flow geometry, we can exhibit convergence guarantees if the target distribution is log-concave. Otherwise, we show that we can approximate it empirically with a set of ODEs derived from a Gaussian mixture. This set of ODEs define a flow of interacting Gaussian particles that evolves to minimize a free energy. This presentation will be based on the following two papers: https://hal.inria.fr/hal-03665666 and https://arxiv.org/abs/2205.15902.

10/10/2022: Andrea Simonetto

Learning to personalise for time-varying optimisation

We look at optimisation problems that change over time and have a component that capture user's satisfaction to the decisions. In the talk, I will propose several ways to learn satisfaction incorporating user's feedback on the decision in a bandit-like setting, learning optimisation objectives online, and with functional requirements. The talk will be a blend of optimisation algorithms and regression methods to learn user's satisfaction, with theoretical results and an applicative outlook.

10/10/2022: Konstantin Mishchenko

Global Convergence of Regularized Newton Method

We will discuss some recent advances in global convergence of regularized Newton method. We will discuss a regularization strategy based on the gradient norm that converges globally at O(1/k^2) rate for convex functions with Lipschitz Hessians. We will then look at a modified regularization that results in even faster O(1/k^3) convergence rate if the objective has more smoothness. Finally, we will briefly cover how the method can be extended for composite nonsmooth problems. The talk will be based on the following two papers: https://arxiv.org/abs/2112.02089 and https://arxiv.org/abs/2208.05888

2021-22

09/05/2022 Lionel Pournin

Algorithmic, combinatorial, and geometric aspects of linear optimization

The simplex and interior point methods are currently the most computationally successful algorithms for linear optimization. While simplex methods follow an edge path, interior point methods follow the central path. The complexity of these methods is closely related to the combinatorial and geometric structures of the feasible region. Focusing on the analysis of worst-case constructions leading to computationally challenging instances, recently obtained lower bounds on the largest possible diameter of lattice polytopes will be presented. These bounds, via lattice zonotopes, are established using tools from combinatorics and number theory. This talk is based on joint work with Antoine Deza.

09/05/2022 Bruno Ziliotto

Mertens conjectures in absorbing games with incomplete information

We consider zero-sum absorbing games where the payoff depends on two fixed parameters drawn randomly at the outset of the game, one that is announced to Player 1, the other one that is announced to Player 2. We solve two open problems, that correspond to the Mertens conjectures (1986) for this model, that is: (1) the limit value exists, (2) when Player 1 knows both parameters (incomplete information on one side), Player 1 can guarantee uniformly the limit value. The proof builds on a new approximation technique of the belief martingale based on an appropriate simplex triangulation. Unlike this abstract, a major part of the talk will be understandable by a broad mathematical audience.

11/04/2022 Nabil Mustafa

A Primal-Dual Epsilon-Nets Rounding Technique, and its Applications

In 1991, Noga Alon and Daniel Kleitman used an elegant rounding technique, using LP duality and epsilon-nets, to solve the long-standing Hadwiger-Debrunner (p,q) problem in convex geometry. Since then, many applications of similar techniques have been used in combinatorics, algorithms and optimisation. In my talk, I will present three examples of this idea: the original Alon-Kleitman result, then a random-sampling based algorithm for hitting and separating convex sets in high dimensions (joint work with Saurabh Ray and Imre Barany), and finally, time permitting, a beautiful proof by Bousquet, Lochet, and Thomassé of the Erdős-Sands-Sauer-Woodrow conjecture.

11/04/2022 Arnau Padrol

Counting polytopes

This talk will be an introduction to the enumeration of combinatorial types of convex polytopes. While in dimensions up to 3 we have a very good understanding on the asymptotic growth of the number of polytopes with respect to the number of vertices, in higher dimensions we only have coarse estimates. Upper bounds arise from results of Milnor and Thom from real algebraic geometry, whereas lower bounds are obtained with explicit constructions. I will survey the best bounds up to date.

28/03/2022 Alessandro Rudi

Representing non-negative functions, with applications in non-convex optimization, probability representation and beyond

Many problems in applied mathematics are expressed naturally in terms of non-negative functions. While linear models are well suited to represent functions with output in R, being at the same time very expressive and flexible, the situation is different for the case of non-negative functions where the existing models lack one of the good properties.

In this talk we present a rather flexible and expressive model for non-negative functions. We will show direct applications in probability representation and non-convex optimization. In particular, the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally, we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.

Ulysse Marteau-Ferey , Francis Bach, Alessandro Rudi. Non-parametric Models for Non-negative Functions. https://arxiv.org/abs/2007.03926

Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach. Finding Global Minima via Kernel Approximations. https://arxiv.org/abs/2012.11978

Alessandro Rudi, Carlo Ciliberto. PSD Representations for Effective Probability Models. https://arxiv.org/pdf/2106.16116.pdf

28/03/2022 Julien Grand-Clément

Solving Optimization Problems with Blackwell Approachability

In this work, we use the framework of Blackwell approachability to develop novel parameter-free algorithms for solving convex-concave saddle-point problems. Despite numerous theoretical applications, ranging from submodular maximization to stochastic games and calibration, few implementations of Blackwell approachability have been practically used to solve optimization problems. To this end, we introduce the Conic Blackwell Algorithm+ (CBA+) regret minimizer, a new parameter- and scale-free regret minimizer for convex sets, which is based on Blackwell approachability and attains O(sqrt(T)) regret. We show how to efficiently instantiate CBA+ for many decision problems of interest, including matrix games, extensive-form games, distributionally robust logistic regression, Markov decision processes, and Fisher markets. Based on CBA+ as a regret minimizer, we introduce SP-CBA+, a new parameter-free algorithm for solving convex-concave saddle-point problems, which achieves a O(1/sqrt(T)) ergodic rate of convergence. A specific thresholding operation and an alternation scheme are needed to improve the convergence rate of our algorithm. SP-CBA+ is widely applicable and in our simulations, we observe that SP-CBA+ achieves state-of-the-art numerical performance on several synthetic and real instances of saddle-point problems, without the need for any choice of step sizes or other algorithmic parameters.

This is joint work with Christian Kroer.

07/02/2022 Charles Bertucci

Solutions monotones de master equations dans les jeux à champ moyen

Dans les jeux à champ moyen, la master equation est une équation aux dérivées partielles qui permet de caractériser (dans certains régimes) la valeur du jeu. Après avoir introduit ce type d'équations, j'expliquerai comment caractériser des solutions qui sont simplement continues, notamment en utilisant leur monotonie. Cette approche géométrique est relativement peu technique et mène à de nombreuses propriétés pour les solutions : existence, unicité, stabilité.

07/02/2022 Pierre-Cyril Aubin-Frankowski

Reproducing kernels to the rescue: state-constrained Linear-Quadratic Optimal Control and problems with infinitely many inequality constraints

Many optimization problems over function spaces (density estimation, optimal transport, state-constrained optimal control,…) involve an infinite number of pointwise inequality constraints. This side information can originate from both physical and theoretical constraints on the model such as ``stay within boundaries`` in path-planning or ``be nonnegative and sum to one`` in density estimation. On the other hand, reproducing kernels are propitious for pointwise evaluations and some kernels encode very rich classes of functions. Linearly controlled trajectories even define a vector-valued reproducing Hilbert kernel space when equipped with the scalar product corresponding to the quadratic cost. The associated LQ kernel is related to the inverse of the Riccati matrix and to the controllability Gramian, and allows tackling affine state constraints, providing continuous-time numerical solutions.

10/01/2022 Laurent Massoulié

Fast Distributed Optimization with Asynchrony and Time Delays

The training of models over distributed data calls for distributed optimization schemes. This has motivated research on distributed convex optimization, leading to the identification of lower bounds on convergence speed, and distributed optimization algorithms with convergence speed potentially matching these lower bounds, for a variety of settings.

In this talk we focus on the important setting of asynchronous operation, for which we propose optimization algorithms with optimal speed. We next consider systems with heterogeneous communication and computation delays, for which we propose fast asynchronous algorithms adapted to these heterogeneous delays.

10/01/2022 Paul Pegon

Stabilité en transport branché

Le transport branché est une variante du transport optimal où le coût est strictement sous-additif en la masse $m$ à transporter (typiquement $m^\alpha$ pour $\alpha$ entre 0 et 1), ce qui favorise le transport groupé et donne lieu à un déplacement le long d'un réseau. On s'intéresse naturellement aux structures de coût minimal pour des mesures source et destination $(\mu^-,\mu_+)$ fixées. Dans cet exposé, je présenterai les deux principaux modèles du transport branché (Lagrangien et Eulérien), et je m'intéresserai plus particulièrement à la question de la stabilité : une limite de suite de transports optimaux pour $(\mu_n^-,\mu_n^+)$ est-elle optimale pour les mesures source et destination limites $(\mu^-,\mu^+)$ ? Cette question est non triviale dans le cas des exposants sous-critiques $\alpha \leq 1-1/d$ où $d$ est la dimension de l'espace ambiant. Je tâcherai de donner les idées de la preuve du résultat (positif) de stabilité dans le cas Eulérien, et de son extension au cas Lagrangien, démontrée dans un travail avec Maria Colombo, Antoine Prouff et coauteurs.

13/12/2021: Matthieu Fradelizi

Volume de sommes de convexes, volumes mixtes, déterminants et zonoïdes.

Poursuivant, d’une part, des analogies entre théorie de l’information, déterminants et volumes mixtes et d’autre part la relation entre géométrie algébrique et géométrie des convexes, de nombreuses inégalités entre volumes mixtes de la somme de convexes ont été conjecturées ces dernières années sous le nom d’inégalités de type Bézout et de type Alexandrov-Fenchel. Des formes faibles de ces conjectures se traduisent par des comparaisons des volumes des projections de convexes. Nous montrerons ces relations, analyserons ces conjectures, en donnerons des contre-exemples et des preuves partielles, en particulier dans le cas des zonoïdes.

Travaux en collaboration avec Mokshay Madiman, Mathieu Meyer et Artem Zvavitch.

13/12/2021: Jean-François Delmas

Computing Optimal Vaccination in Infinite Dimension (for SIS model)

This is a joint work with Dylan Dronnier and Pierre-André Zitt. See references: https://arxiv.org/abs/2006.08241, https://arxiv.org/abs/2103.10330, https://arxiv.org/abs/2110.12693.

08/11/2021: Benoit Bonnet

Set-valued and non-smooth analysis in Wasserstein spaces with applications in mean-field control theory

In this talk, I will expose recent results obtained in collaboration with Hélène Frankowska as part of an ongoing research program on the topic of mean-field optimal control.

I will first present a new definition of differential inclusion in the space of measures, and show that it produces a meaningful and readily applicable set-valued generalisation of the notion of continuity equations in Wasserstein spaces. Afterwards, I will briefly discuss how the latter can be used in several areas of mean-field control, and in particular for establishing first-order necessary optimality conditions of Pontryagin type.

I will then move on to the second object of this presentation, which are the fine properties of the value function associated with mean-field optimal control problems. In this context, I will state sufficient conditions ensuring that the latter is semiconcave with respect to the intrinsic geodesic structure of Wasserstein spaces, which provides rich information about the structure of its superdifferentials. I will then show a generalisation of the so-called sensitivity relations, which link the costate variables stemming from the Pontryagin Maximum Principle to the supergradients of the value function, and finally discuss applications of these results in defining set-valued optimal feedbacks for mean-field optimal control problems.

08/11/2021: Jimmy Lamboley

Régularité des formes convexes minimisant le périmètre

Résumé : dans cet exposé nous discuterons de la théorie de régularité en optimisation de forme sous contrainte de convexité. Plus précisément, on s’intéressera aux domaines qui minimisent -parmi les domaines convexes, ou seulement parmi ceux de volume donné- une fonctionnelle de la forme $P(\Omega)+R(\Omega)$ où $P$ désigné le périmètre du domaine $\Omega$, et $R$ est vu comme un terme perturbatif. Cela nous amène à une notion de quasi-minimiseur du périmètre sous contrainte de convexité, et nous montrons que de tels domaines sont de régularité $C^{1,1}$, et que cette régularité est optimale.

Le terme perturbatif peut inclure des fonctionnelles dépendant d’EDP, par exemple les valeurs propres du Laplacien sur le domaine considéré. En perspective, on précisera les applications espérées de ces résultats dans le contexte de la question de la stabilité en optimisation de forme.

Il s’agit d’un travail en commun avec Raphaël Prunier

04/10/2021: Rémi Flamary

Optimal transport for graph data : barycenters and dictionary learning.

In recent years the Optimal Transport (OT) based Gromov-Wasserstein (GW) divergence has been investigated as a similarity measure between structured data expressed as distributions typically lying in different metric spaces, such as graphs with arbitrary sizes. In this talk, we will address the optimization problem inherent in the computation of GW and some of its recent extensions, namely the Entropic and the Fused GW divergences. Next we will illustrate how these OT problems can be used to model graph data in learning scenarios such as graph compression, clustering and classification. Finally we will present a novel approach performing linear dictionary learning on graphs datasets using GW as data fitting term which simultaneously provides convenient graphs modeling for the aforementioned applications and efficient approximations to the GW divergence.

04/10/2021 Yann Brenier

"La relativité générale dans le vide revue comme un problème de transport optimal quadratique matriciel"

On revoit les équations d'Einstein de la relativité générale dans le vide comme équations d'optimalité d'une sorte de problème de transport optimal quadratique à valeurs matricielles.

05/10/2020 Anna Korba

Sampling as optimization of the relative entropy over the space of measures : a non asymptotic analysis of SVGD and the Forward-Backward scheme.

We consider the problem of sampling from a log-concave probability distribution π ∝ exp(−V ) on ℝ^d. This target distribution π can be seen as a minimizer of the relative entropy functional with respect to π, defined on the space of probability distributions. A general strategy to minimize a function is to run the gradient flow dynamics. On the space of probability measures, Wasserstein gradient flows define such curves of steepest descent for the objective functional. In this talk, I will discuss recent works [1,2] on two different algorithms that result from different time-space discretizations of this gradient flow. The first one [1] provides a novel finite time analysis for the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate π. It implements a forward discretization of the Wasserstein gradient flow of the relative entropy, where the gradient is smoothed through a kernel integral operator. The second one [2] proposes a Forward-Backward discretization scheme for this gradient flow. Using techniques from convex optimization and optimal transport, we show that it has convergence guarantees similar to the proximal gradient algorithm in Euclidean spaces.

[1] A Non-Asymptotic Analysis of Stein Variational Gradient Descent. A. Korba, A. Salim, M. Arbel, G. Luise, A. Gretton. to appear at Neurips 2020.

[2] The Wasserstein Proximal Gradient Algorithm. A. Salim, A. Korba, G. Luise. to appear at Neurips 2020.

05/10/2020 Sylvain Sorin

No-regret algorithms in on-line learning, games and convex optimization.

The purpose of this talk is to underline links between no-regret algorithms used in learning, games and convex optimization. In particular we will study continuous and discrete time versions and their connections. We will comment on recent advances on (1) Euclidean and non-euclidean approaches (2) Speed of convergence of the evaluation (3) Convergence of the trajectories.

04/05/2020 Jérôme Malick [CANCELLED]

Proximal identification and applications

Nonsmoothness typically arises in a highly structured way which yields nice properties for optimization. For example, optimal solutions of nonsmooth optimization problems are often trapped in "active sets": solutions are located on low-dimensional smooth manifolds and do not move out of it under small perturbations. Moreover proximal algorithms identify these active sets — exactly in non-degenerate cases and approximately in degenerate cases. In this talk, I discuss this (enlarged) identification property and give two applications: (i) model consistency in statistical learning, and (ii) communication reduction in distributed learning.

02/03/2020 Aymeric Dieuleveult

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Abstract: Constant step-size Stochastic Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point. In this paper, we show that efficiently detecting this transition and appropriately decreasing the step size can lead to fast convergence rates. We analyse the classical statistical test proposed by Pflug (1983), based on the inner product between consecutive stochastic gradients. Even in the simple case where the objective function is quadratic, we show that this test cannot lead to an adequate convergence diagnostic. We propose then a novel and simple statistical procedure that accurately detects stationarity and we provide experimental results showing state-of-the-art performance on synthetic and real-world datasets. (Joint work with Scott Pesme and Nicolas Flammarion.)

02/03/2020 Beniamin Bogosel

Parametric shape optimization using the support function

The optimization of functionals depending on shapes which have convexity, diameter or constant width constraints poses difficulties from a numerical point of view since not all arbitrarily small perturbations are admissible. This talk presents how the support function of a convex body can be used in order to find approximate solutions to such problems by solving finite dimensional optimization problems under various constraints. After constructing the numerical framework, some numerical examples from convex geometry are presented. The functionals to be optimized range from simple examples related to the volume and the perimeter of the convex shape to more complex functionals depending on the solution of a partial differential equation. In particular, Meissner's conjecture regarding three dimensional bodies of constant width is confirmed numerically by solving an optimization problem. This is a joint work with Pedro Antunes.

03/02/2020 Elias Tsigaridas

Bounds on the degree of the central curve of semidefinite programming

We present bounds on the algebraic degree of the central curve of semidefinite programming (SDP). We derive the bounds based on the degree of the underlying polynomial systems that exploit the primal and dual formulation of SDP.

03/02/2020 Simone Naldi

Hyperbolic polynomials and optimization

In the talk we will discuss a generalization of semidefinite programming (SDP), called hyperbolic programming (HP). This is a convex optimization problem that can be constructed from a multivariate hyperbolic polynomial (and a few other data), and that has a strong algebraic structure, such as SDP, but less explicit. Indeed, when the polynomial has a definite determinantal representation, then the hyperbolic program reduces to a semidefinite program. The fact that not every hyperbolic polynomial has such a representation seems to suggest that this is not always the case, but the Generalized Lax Conjecture (GLC) claims the opposite. I will also discuss recent results and also a weak version of the GLC, that is a motivation of addressing the problem from a computational point of view.

13/01/2020 Claire Boyer

On representer theorems and convex regularization

We establish a general principle which states that regularizing an inverse problem with a convex function yields solutions which are convex combinations of a small number of atoms. These atoms are identified with the extreme points and elements of the extreme rays of the regularizer level sets. As a side result, we characterize the minimizers of the total gradient variation. This is a joint work with Antonin Chambolle, Yohann De Castro, Vincent Duval, Frédéric de Gournay, and Pierre Weiss.

13/01/2020 Thomas Moreau

Learning step sizes for unfolded sparse coding

Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. However, the reason why learning the weights of such a network would accelerate sparse coding are not clear. In this talk, we look at this problem from the point of view of selecting adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most large-scale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate if the learned algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that learning step sizes can effectively accelerate the convergence when the solutions are sparse enough

09/12/2019 Yurii Nesterov

Relative Smoothness: New Paradigm in Convex Optimization

Development and computational abilities of optimization methods crucially depend on the auxiliary tools provided to them by the method’s designers. During the first decades of Convex Optimization, the methods were based either on the proximal setup, allowing Euclidean projections onto the basic feasible sets, or on the linear minimization framework, which assumes a possibility to minimize a linear function over the feasible set. However, recently it was realized that any possibility of simple minimization of an auxiliary convex function leads to the efficient minimization methods for some family of more general convex functions, which are compatible with the first one. This compatibility condition, called relative smoothness, was firstly exploited for smooth convex functions (Bauschke, Bolt and Teboulle, 2016) and smooth strongly convex functions (Lu, Freund and Nesterov, 2018). In this talk we make the final step and show how to extend this framework onto the class of nonsmooth functions. We also discuss possible consequences and applications.

09/12/2019 Jérôme Malick : reporté au 4 mai 2020.

04/11/2019 Nguyễn Kim Thắng

Online Non-monotone Submodular Maximization: Approximation and Regret Guarantees

Submodular and diminishing-returns (DR) submodular optimization are important optimization problems with many real-world applications in machine learning, economics and communication systems. Moreover, DR-submodular optimization captures a subclass of non-convex optimization that provides both practical and theoretical guarantees.

In this talk, we present algorithms with performance guarantees for the fundamental problems of maximizing non-monotone submodular and DR-submodular functions over convex sets in the online environment. In several settings, these guarantees match the currently best-known ones in the offline environment.

04/11/2019 Irène Waldspurger

Rank optimality for the Burer-Monteiro factorization

The Burer-Monteiro factorization is a classical heuristic used to speed up the solving of large scale semidefinite programs when the solution is expected to be low rank: One writes the solution as the product of thinner matrices, and optimizes over the (low-dimensional) factors instead of over the full matrix. Even though the factorized problem is non-convex, one observes that standard first-order algorithms can often solve it to global optimality. This has been rigorously proved by Boumal, Voroninski and Bandeira, but only under the assumption that the factorization rank is large enough, larger than what numerical experiments suggest. We will describe this result, and investigate its optimality. More specifically, we will show that, up to a minor improvement, it is optimal: without additional hypotheses on the semidefinite problem at hand, first-order algorithms can fail if the factorization rank is smaller than predicted by current theory.

14/10/2019 Pascal Bianchi

Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization

ADAM is a popular variant of the stochastic gradient descent for finding a local minimizer of a function. The objective function is unknown but a random estimate of the current gradient vector is observed at each round of the algorithm. Assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point. The key ingredient is the introduction of a continuous-time version of ADAM, under the form of a non-autonomous ordinary differential equation.

14/10/2019 Vincent Leclère

Decomposition methods for convex multistage stochastic optimization problem

Multistage stochastic optimization problem have usually deterministic equivalent of huge size. In this talk we quickly review some ideas that allow to decompose such problem, with a focus on Dynamic Programming methods. We present the Stochastic Dual Dynamic Programming (SDDP) algorithm, which is an algorithm approximating the Bellman cost-to-go function by polyhedral functions. Equivalently it can be seen as a nested Bender's decomposition method. Finally, leveraging Fenchel transform we present a recent dual approach to SDDP.

03/06/2019 Sebstian Stich

Gradient Compression with Error Feedback for Distributed Optimization

The communication overhead is a key bottleneck that hinders perfect scalability of distributed optimization algorithms. Various recent works propose to use quantization or sparsification techniques to reduce the amount of data that needs to be communicated between workers.

We analyze Stochastic Gradient Descent (SGD) with compression (like e.g. sparsification or quantization) of exchanged messages and show that these schemes converge at the same rate as vanilla SGD when equipped with error compensation technique (i.e. keeping track of accumulated errors in memory). We discuss both centralized and decentralized implementations, the latter based on modification of the gossip protocol.

03/06/2019 Dan Garber

Highly-efficient Algorithms for Online Learning with Semi-adversarial Sequences

Online Learning is an appealing and successful paradigm for sequential prediction under uncertainty. In particular, efficient online algorithms for regret minimization usually require only convexity and boundness assumptions on the model (i.e., the feasible set and loss functions), and thus could be applied, with provable performance guarantees, to a great variety of prediction scenarios, which go far beyond more traditional models of assuming stochastic i.i.d. data.

On the downside, for several problems of interest, all current online leaning algorithms are not scalable to large instances. For instance, all current algorithms for Online Principal Component Analysis require quadratic memory and at least quadratic runtime per iteration. This is in stark contrast to the offline or stochastic i.i.d variants of the problem, for which there exist algorithms that require only linear memory and linear runtime per iteration. Another example, is when the loss functions are exp-concave (e.g., linear regression). In this case, the only current algorithm to achieve (optimal) logarithmic regret, requires quadratic memory and to solve a linear system of equations on each iteration, whereas the offline problem could be solved via highly-efficient gradient methods which require only linear memory.

In this talk I will show that imposing some more structure on the data can result in much more efficient regret minimization algorithms. In particular, I will consider cases in which the data follows a (unknown) stationary i.i.d. distribution combined with adversarial perturbations. I will show how such models allow to 1) obtain a linear-memory and linear-runtime algorithm for Online PCA with a nearly optimal regret bound, and 2) obtain a logarithmic regret bound for the vanila Online Gradient Descent method for several problems of interest with exp-concave losses such as Online LASSO, and Online Portfolio Selection.

06/05/2019 Kevin Scaman

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

The training of machine learning methods on large-scale datasets often requires the use of distributed computing platforms, both to parallelize computation and allow the storage of the whole dataset across multiple machines. In such a context, a slow communication between machines may downgrade the performance of inference or optimization algorithms. In this presentation, I will show how the ideas of optimization theory can be extended to understand the fundamental limitations of distributed algorithms when machines are bound to communicate across a network of limited bandwidth, and how to derive optimal distributed optimization algorithms. More specifically, I will provide optimal convergence rates for the distributed optimization of convex functions in two settings (global and local regularity assumptions), along with algorithms meeting these optimal rates.

06/05/2019 Jean Feydy

Discrete Optimal Transport: progress and perspectives

Computing distances between weighted point clouds is a fundamental problem in applied mathematics, with applications ranging from Computer Vision (SLAM, reconstruction from LIDAR...) to Medical Imaging (image, mesh, fiber track registration) and Machine Learning (GANs, VAEs, geometric deep learning...).

In practice, researchers strive to endow spaces of measures with distances that must:

- Lift a ground metric defined on Rd as faithfully as possible;

- Define informative gradients with respect to the points' weights and positions;

- Be affordable, from a computational point of view.

In this talk, I will first present the three families of distances that have shaped the literature since the 50's:

- Soft-Hausdorff (aka. ICP, GMM-loglikelihood) losses, which rely on (soft) nearest-neighbour projections.

- Kernel (aka. Sobolev, Electrostatic, MMD, blurred SSD) norms, which rely on (discrete) convolutions.

- Optimal Transport (OT, aka. Earth-Mover, Wasserstein, Linear Assignment) costs, which often rely on linear or convex optimizers.

Focusing on OT, which is most appealing from a geometric perspective, I will then explain how to compute Wasserstein distances *efficiently*. Leveraging recent works (2016+) on entropic regularization, we will see that fast multiscale strategies now allow us to compute smooth, approximate solutions of the OT problem with strong theoretical guarantees. As evidenced by a live demonstration of the GeomLoss python library: https://www.kernel-operations.io/geomloss/ on modern gaming hardware, Wasserstein distances (and gradients!) can now be computed between millions of samples in a matter of seconds.

Available through efficient (~ log-linear), robust and well-packaged GPU routines, Wasserstein distances can now be used as a drop-in replacement for the cheaper Hausdorff and Kernel distances. But how useful can this tool be in applied settings?

To conclude this talk, I will present open problems related to OT theory and, more generally, to *geometric data analysis*:

- the link between entropic regularization, de-biasing and low-frequency OT;

- the convergence of annealing schemes for transport problems;

- the gap between Kernel/OT losses and generic dual norms, known as Adversarial Networks in the recent machine learning literature;

- the distinction between "geometric" and "pointwise" learning problems.

References:

- "The invisible hand algorithm: Solving the assignment problem with statistical physics", Kosowsky and Yuille, Neural Networks, 1994.

- "Computational Optimal Transport", Peyré and Cuturi, optimaltransport.github.io/book , 2018.

- "Stabilized Sparse Scaling Algorithms for Entropy Regularized Transport Problems", Schmitzer, 2016.

- "Global divergences between measures: from Hausdorff distance to Optimal Transport", Feydy and Trouvé, ShapeMI 2018.

- "Interpolating between Optimal Transport and MMD using Sinkhorn Divergences", Feydy, Séjourné, Vialard, Amari, Trouvé, Peyré, AiStats 2019.

11/03/2019 Luce Brotcorne

How to integrate customer’s behaviour within pricing?

Increased competition in a globalized economy, real-time access to a wealth of transparent information, the rise of a more knowledgeable and pragmatic generation of consumers is currently changing the perception and nature of optimal pricing. Modeling the customer reaction to price is nowadays a central issue for most companies.

Bilevel models are well suited to capture such situations where two decision makers act non cooperatively and in a sequential way. More precisely, bilevel programs allow modeling situations in which a main agent, the leader, strives to optimize a given quantity but controls only a subset of the decision variables. The remaining variables fall under the control of a second agent, the follower, who solves its own problem by taking into account the decisions taken by the leader. In other words, bilevel programs can be considered as demand-offer equilibrium models where the demand is the result of another mathematical problem.

In this talk we first present a brief theoretical study of bilevel problems. Next we focus on two special cases : a price setting problem on a network and a demand side and revenue management problem in the energy field.For each of these two applications, we define the models, study their properties and sketch solution methods based on the structures of the problems.

11/03/2019 Antoine Désir

A markovian approach for choice modeling and assortment optimization

Which set of products should be offered to arriving consumers to maximize expected revenue? This is a core revenue management problem known as assortment optimization which applies to a wide variety of settings (from retail to e-commerces). Discrete choice model theory offers a way to mathematically model the substitution behavior of consumers and provide a key ingredient for this problem. Many choice models have been proposed in the literature, introducing a fundamental tradeoff between model expressiveness and computational complexity. In particular, the assortment optimization problem is notoriously hard for general choice models. In this talk, we look at a new framework which tries to strike a good balance between expressiveness and tractability. In particular, the substitution behavior of consumers is modeled as transitions in a Markov chain. By doing so, this new model helps alleviate the Independence of Irrelevant Alternatives (IIA) property, a well-known limitation of the popular multinomial logit model. Moreover, it provides a good approximation to the class of random utility models. We show that not only this model has a great predictive power, it is also tractable from a computational perspective. In particular, we give an algorithm framework to derive efficient algorithm for different variants of the assortment optimization problem.

8/04/2019 Lénaïc Chizat

Solving Convex Problems over Measures with Over-parameterized Gradient-based Algorithms

Minimizing a convex function of a signed measure is a typical problem in signal processing or machine learning arising e.g., in gridless spikes deconvolution or two layers neural networks training. This is a difficult problem: convex algorithms that are traditionally studied often exhibit undesirable behavior (high complexity or low precision). In this talk, I will present an alternative non-convex approach: discretize the measure into "particles" and run gradient descent on their positions and weights. This is simple to implement and often efficient in practice, but the theoretical analysis is challenging due to non-convexity. We will first discuss a general consistency result: when properly initialized, this method converges to global minimizers as the number of particles goes to infinity. We will then give a local convergence analysis, which leads to complexity bounds when the solution is assumed to be sparse. Our analysis involves tools and ideas from (unbalanced) optimal transport and Wasserstein gradient flows theory. This talk is based on joint work with Francis Bach.

8/04/2019 Luca Nenna

Unequal Dimensional Optimal Transport, Monge-Ampere equations and beyond

This talk is devoted to variational problems on the set of probability measures which involve optimal transport between unequal dimensional spaces. In particular, we study the minimization of a functional consisting of the sum of a term reflecting the cost of (unequal dimensional) optimal transport between one fixed and one free marginal, and another functional of the free marginal (of various forms). Motivating applications include Cournot-Nash equilibria where the strategy space is lower dimensional than the space of agent types. For a variety of different forms of the term described above, we show that a nestedness condition, which is known to yield much improved tractability of the optimal transport problem, holds for any minimizer. Depending on the exact form of the functional, we exploit this to find Monge-Ampere type equations characterising solutions, prove convergence of an iterative scheme to compute the solution, and prove regularity results. This a joint work with Brendan Pass.

11/02/2019 Georgina Hall

LP, SOCP, and optimization-free approaches to sum of squares optimization

The problem of optimizing over the cone of nonnegative polynomials is a fundamental problem in computational mathematics, with applications to polynomial optimization, control, machine learning, game theory, and combinatorics, among others. A number of breakthrough papers in the early 2000s showed that this problem, long thought to be intractable, could be solved by using sum of squares programming. This technique however has proved to be expensive for large-scale problems, as it involves solving large semidefinite programs (SDPs).

In the first part of this talk, we present two methods for approximately solving large-scale sum of squares programs that dispense altogether with semidefinite programming and only involve solving a sequence of linear or second order cone programs generated in an adaptive fashion. In the second part of the talk, we focus on the problem of finding tight lower bounds on polynomial optimization problems (POPs), a fundamental task in this area that is most commonly handled through the use of SDP-based sum of squares hierarchies (e.g., due to Lasserre and Parrilo). In contrast to previous approaches, we provide the first theoretical framework for efficiently constructing converging hierarchies of lower bounds on POPs whose computation does not require any optimization, but simply the ability to multiply certain fixed polynomials together and check nonnegativity of the coefficients of the product.

11/02/2019 Sourour Elloumi

Equivalent reformulations of Polynomial Optimization Problems with mixed-integer variables

Quadratic programs in binary variables, or in mixed-integer variables have a very wide scope of application but their resolution remains difficult. We review the types of methods used. Then, we focus on their resolution through linear reformulation, of through quadratic reformulation with some convexity properties. We consider in particular the case of compact linearization for quadratic programming in binary variables. We show that an infinity of such linearizations are possible and that an "optimal" linearization in a certain sense can be computed. We then present the convex quadratic reformulation approach and show, in a similar way, how to obtain an optimal convex quadratic reformulation. Finally, we highlight the generalization of this approach towards quadratic programming in integer or continuous variables, and towards polynomial optimization of higher degree.

14/01/2019 Yohann De Castro

Programmation Semi-Définie pour résoudre le problème des Plans d'Expérience Optimaux Approchés Nous introduisons une nouvelle approche visant à calculer les plans optimaux pour la régression polynomiale multivariée sur des espaces compacts semi-algébriques.

Nous utilisons les hiérachies ‘‘moment-sum-of-squares'’ (moment-SoS) des problèmes de programmation semi-définie pour résoudre ce problème.

Le plan optimal est récupéré via la théorie de la dualité de la programmation semi-définie. Cet exposé montre que la hiérarchie converge vers le plan optimal à mesure que l'ordre de la hiérarchie augmente.

En outre, nous fournissons un certificat dual assurant la convergence finie de la hiérarchie et nous montrons que le plan optimal peut être calculé numériquement en temps polynomial avec notre méthode.

En tant que sous-produit, nous revisitons le ‘‘théorème d'équivalence'' de la théorie des plans optimaux: il est lié au polynôme Christoffel et il caractérise la convergence finie des hiérarchies SoS.

14/01/2019 Clarice Poon

The Fisher metric and support stability for off-the-grid compressed sensing

Sparse regularization is a central technique for both machine learning (to achieve supervised features selection or unsupervised mixture learning) and imaging sciences (to achieve super-resolution).

Existing performance guaranties assume a separation of the spikes based on an ad-hoc (usually Euclidean) minimum distance condition, which ignores the geometry of the problem.

We study the BLASSO (i.e. the off-the-grid version of l^1 LASSO regularization) and show that the Fisher-Rao distance is the natural way to ensure and quantify support recovery,

since it preserves the invariance of the problem under reparameterization.

We prove that under mild regularity and curvature conditions, stable support identification is achieved even in the presence of randomized sub-sampled observations (which is the case in compressed sensing or learning scenario).

On deconvolution problems, which are translation invariant, this generalizes to the multi-dimensional setting existing results of the literature.

For more complex translation-varying problems, such as Laplace transform inversion, this gives the first geometry-aware guarantees for sparse recovery.

10/12/2018 François Clautiaux

State-space aggregation for large dynamic programs and network flow formulations.

Dynamic programming (DP) is an ubiquitous tool in discrete and continuous optimization/control. We study mathematical formulations generated by a set of DPs and integer linear programs. They have typically a very large number of variables/constraints (both pseudo-polynomial and exponential). We consider techniques based on iterative relaxations to produce efficient algorithms for these formulations.

10/12/2018 Axel Parmentier

Stochastic and non-linear resource constrained shortest path problems: algorithms and applications.

Resource constrained shortest path (RCSP) problems arise as pricing subproblems of column generation approaches to many real-life routing and scheduling problems. It is therefore practically important to solve them efficiently. State of the art algorithms for RCSP problems are based on a smart enumeration of all the non-dominated paths. Recent improvements of these enumeration algorithms rely on the use of bounds on path resources to discard partial solutions. The quality of the bounds determines the performance of the algorithm. However, there is no generic algorithm to compute such bounds for non-linear and stochastic RCSP problems. Leveraging lattice theory, we introduce algebraic procedures to generate bounds on paths resources and exploit them in enumeration algorithms. We illustrate the relevance of the approach on a wide range of non-linear, robust, and stochastic applications, including crew pairing for airlines, stochastic manpower scheduling, unit commitment for energy producers, and electrical vehicle scheduling problems.

05/11 Adrien Taylor

Automated analysis and design of first-order methods using performance estimation.

We present the performance estimation approach. This methodology aims at automatically analyzing convergence properties of first-order methods for solving (composite convex) optimization problems. In particular, it allows obtaining tight guarantees for fixed-step first-order methods involving a variety of oracles - that includes explicit, projected, proximal, conditional and inexact (sub)gradient steps - and a variety of convergence measures. In this presentation, we present and illustrate the performance estimation methodology and how it can be used for developing new algorithms and convergence proofs.

This talk is based on joint works with François Glineur, Julien Hendrickx, Etienne de Klerk, Yoel Drori, Pontus Giselsson, Carolina Bergeling,

Ernest Ryu and Francis Bach.

05/11 Aris Daniilidis

Étude asymptotique du processus de rafle.

Dans cet exposé nous nous intéressons au processus de rafle définie par un opérateur multivoque définissable dans une structure o-minimale.

Nous établissons une inégalité de type KL adaptée au problème et nous l'utilisons pour montrer que les trajectoires bornées sont de longueur finie.

Cette méthode permet d'obtenir le cas classique de la dynamique du gradient comme cas particulier, en considérant le processus de rafle correspondant aux sous-niveaux d'une fonction de classe C^1 o-minimale.

2017 - 2018

11/6 Jon Lee

More Virtuous Smoothing and Embracing the Ghost of Rolle

In the context of global optimization of mixed-integer nonlinear optimization formulations,

we consider smoothing univariate concave increasing functions that have poorly behaved derivative at 0

(for example, root functions). Extending earlier work of Lee and Skipper, we give general conditions under which our smoothing is concave and increasing, is an underestimator, and dominates a

simple "shift smoothing". This is joint work with Luze Xu and Daphne Skipper.

11/6 Ivana Ljubic

New Branch-and-Cut Algorithms for Mixed-Integer Bilevel Linear Programs

In this talk we focus on new branch-and-cut (B&C) algorithms for dealing with mixed-integer bilevel linear programs (MIBLPs). MIBLPs constitute a significant family of bilevel optimization problems, where all objective functions and constraints are linear, and some/all variables are required to take integer values.

We first consider a general case in which the proposed B&C scheme relies on intersection cuts derived from feasible-free convex sets, see [1,2]. Our B&C scheme is finitely-convergent in case both the leader and the follower problems are pure integer. In addition, it is capable of dealing with continuous variables both in the leader and in follower problems—provided that the leader variables influencing follower’s decisions are integer and bounded. Our new algorithm consistently outperforms (often by a large margin) all alternative state-of-the-art methods from the literature, including methods which exploit problem specific information for special instance classes. In particular, it allows to solve to optimality more than 300 previously unsolved instances from literature.

We then focus on a subfamily of MIBLPs in which the leader and follower typically share a set of items, and the leader can select some items and interdict their usage by the follower. The only constraints in the follower subproblem involving leader decision variables impose that, if an interdiction variable is selected by the leader, then certain actions of the follower are inhibited. Interdiction Problems, Blocker Problems, Critical Node/Edge Detection Problems are some examples satisfying the later condition. We show that, in case the follower subproblem satisfies monotonicity property, a family of "interdiction-cuts" can be derived resulting in a more efficient B&C scheme. Computational results are reported for the (multidimensional) knapsack interdiction and the maximum clique interdiction problems, see [3,4].

Literature.

[1] M. Fischetti, I. Ljubic, M. Monaci, M. Sinnl, On the Use of Intersection Cuts for Bilevel Optimization, Mathematical Programming, online first, 2017, DOI 10.1007/s10107-017-1189-5

[2] M. Fischetti, I. Ljubic, M. Monaci, M. Sinnl, A new general-purpose algorithm for mixed-integer bilevel linear programs, Operations Research 65(6): 1615-1637, 2017

[3] M. Fischetti, I. Ljubic, M. Monaci, M. Sinnl: Interdiction Games and Monotonicity, with Application to Knapsack Problems,INFORMS Journal on Computing, to appear, 2018

[4] F. Furini, I. Ljubic. P. San Segundo, S. Martin: The Maximum Clique Interdiction Game, submitted, 2018

14/5 Timo de Wolff

A New Approach to Nonnegativity and Polynomial Optimization

Deciding nonnegativity of real polynomials is a fundamental problem in real algebraic geometry and polynomial optimization. Since the 19th century, sums of squares (SOS) are a standard certificates for nonnegativity, which can be detected via semidefinite programming (SDP) in practice. This approach, however, does not always work well, especially if the optimization problem has many variables or high degree. In 2014, Sadik Iliman and I introduced a new nonnegativity certificate based on sums of nonnegative circuit polynomials (SONC). These certificates are independent of sums of squares. In the same year, we successfully applied SONCs to global nonnegativity problems using geometric programming (GP). In 2016, Mareike Dressler, Sadik Iliman, and I provided a converging hierarchy of lower bounds based on SONCs for constrained polynomial optimization problems. We showed that these bounds can be computed via relative entropy programming. Recently, my student Henning Seidler and I finished a prototype of a new software for optimization via SONCs and GP. So far, it covers unconstrained polynomial optimization where it already yields very promising results for several cases, which are problematic for SOS based SDP methods. In this talk, I will give an overview about polynomial optimization via sums of nonnegative circuit polynomials and show some computational results

14/5 Robert Gower

Stochastic Quasi-Gradient Methods: Variance Reduction via Jacobian Sketching

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method---JacSketch---is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient, followed by a stochastic gradient descent step. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a {\em stochastic quasi-gradient method}. Indeed, quasi-Newton methods project the current Hessian estimate onto a solution space of a linear equation consistent with a certain linear (but non-random) measurement of the true Hessian. Our method can also be seen as stochastic gradient descent applied to a {\em controlled stochastic optimization reformulation} of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a {\em stochastic Lyapunov function}. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates. Moreover, we obtain the first minibatch SAGA method with importance sampling.

9/4 Alberto Bressan

On the optimal shape of tree roots and branches

Living organisms come in an immense variety of shapes, such as roots, branches, leaves, and flowers in plants, or bones in animals. In many cases it is expected that, through natural selection, these organisms have evolved into a ``best possible" shape. From a mathematical perspective, it is thus of interest to study functionals whose minimizers correspond to some of the many shapes found in the biological world.

As a step in this direction, we consider two functionals that may be used to describe the optimal configurations of roots and branches in a tree. The first one, which we call the ``sunlight functional", models the total amount of sunlight captured by the leaves of a tree.

The second one, which we call the ``harvest functional", models the amount of nutrients collected by the roots.

The above functionals will be combined with a ``ramified transportation cost", for transporting nutrients from the roots to the base of the trunk, or from the base of the trunk to the leaves. The talk will address the semicontinuity of these functionals, and the existence and properties of optimal solutions, in a space of measures.

9/4 Rasmus Ibsen-Jensen

How to play the Big Match

Odd and Even is a very simple game that has been played at least since ancient Greece and the Roman Empire. First, Player 1 hides a number of small objects and then Player 2 guesses whether the number of items was odd or even. If Player 2 is correct, he wins a point from Player 1 and otherwise loses one. The Big Match is a variant of playing Odd and Even repeatedly. The variant is such that the players must play the same as in the last round if the number of items hidden in that round was even. It is one of the simplest games illustrating the difficulty in balancing short and long term gains. It is possible for each player to ensure that the average per round increase in their number of points is above -epsilon, for any epsilon>0, if the number of repetitions of the game is sufficiently large or infinite, even if the duration of the game is unknown to the players. However, the classical strategies for Player 1 that ensures such requires using log T bits of memory for the first T rounds against some strategies for Player 2. The talk will give strategies ensuring such for which log log T bits of memory for the first T rounds suffice whp. and even some where 1 bit suffice if the current round number is known to Player 1. Both results generalizes to various classes of games (stochastic and absorbing games respectively). Also, both results requires randomizing the update of memory and as will be argued in the talk, this is necessary. The talk is based on joint work with Kristoffer Arnsfelt Hansen, Michal Koucký and Abraham Neyman.

12/3 Julien Mairal

Generic Acceleration Schemes for Gradient-Based Optimization

In this talk, we present generic techniques to accelerate gradient-based optimization algorithms. These approaches build upon the inexact proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary subproblems, leading to faster convergence for the original problem. We introduce two variants based on Nesterov's acceleration and Quasi-Newton principles, respectively. One of the key to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy.

12/3 Frédéric Meunier

Gâteau d’anniversaire, joueur qui disparaît, gâteau brûlé : quelques variations autour du partage de gâteau sans-envie

Considérons n joueurs ayant des préférences sur l’ensemble des parties connexes d’un gâteau, identifié avec l’intervalle [0,1]. Un théorème classique de Stromquist assure, sous des conditions assez faibles, l’existence d’un partage sans-envie du gâteau en n parts (connexes), i.e. qui soit tel qu'aucun joueur ne préfère strictement une part donnée à un autre joueur.

Une généralisation récente de ce théorème montre que même si les préférences d’un des joueurs ne sont pas connues (c’est son anniversaire et on ne le consulte pas), il existe encore un partage sans-envie du gâteau : on pourra découper le gâteau, laisser choisir ce joueur, et il existera toujours une affectation sans-envie des n-1 parts restantes.

Nous proposons une nouvelle méthode pour démontrer le théorème de Stromquist et sa généralisation. Cette méthode permet de prouver de nouvelles variations, dont une par exemple assure l’existence d’un découpage du gâteau en n-1 parts permettant une affectation sans-envie de ces parts, quelque soit le joueur décidant de quitter la salle.

Un élément central de cette méthode est l’utilisation de nouvelles extensions du lemme de Sperner (version discrète du théorème du point fixe de Brouwer).

Cet exposé s’appuiera sur des travaux récents menés en collaboration avec Florian Frick, Kelsey Houston-Edwards, Francis Su et Shira Zerbib.

12/2 Anatoli Juditsky

Sur la quasi-optimalité de l'estimation linéaire

Etant donné une observation indirecte bruitée $\omega = Ax + \xi $, nous considérons le problème d'estimation d’une image linéaire $Bx$ d'un signal $x$ dont on suppose d’appartenir à un ensemble donné $X$. Sous certaines hypothèses sur $X$ (satisfait, par exemple, lorsque $X$ est l'intersection de $K$ cylindres elliptiques, ou la boule unité de la norme spectrale dans l'espace des matrices) et sur la norme $ \| \cdot \| $ utilisé pour mesurer l'erreur d’estimation (satisfait, par exemple, par les $p$-normes, $ 1 \leq p \leq 2$, sur $ R^m $ et par la norme nucléaire sur l'espace des matrices), et {\em sans imposer aucune restriction sur les applications $A$ et $B$,} nous construisons une estimée {\em linéaire} de $Bx$ qui est quasi-optimale parmi toutes les estimées (linéaires ou non linéaires) en termes d’espérance de la perte $\| \cdot \| $ pire-cas sur $ X$.

Ces résultats forment une extension essentielle des résultats classiques (e.g., Pinsker 1980 et Donoho, Liu et MacGibbon, 1990), qui ne traitent que le cas de la norme euclidienne $ \| \cdot \| $ et des matrices diagonales $A$ et $B$, et impose des hypothèses beaucoup plus restrictives sur l'ensemble $X$ de signaux. Toutes les constructions et preuves théoriques s'appuient fortement sur les outils d'optimisation convexe.

(travail en collaboration avec A. Nemirovski, Georgia Tech)

12/2 Anders Hansen

On computational barriers in optimisation and the paradoxes of deep learning

Linear and semidefinite programming (LP, SDP), regularisation with Basis Pursuit and Lasso, as well as neural networks in deep learning have seen great success in mathematics, learning, statistics and data science. The success of LP is traditionally described by the fact that it is in P for rational inputs. However, in the list of problems for the 21st century Smale calls for "[Computational] models which process approximate inputs and which permit round-off computations". Hence, we will discuss the following paradox: It is impossible to design accurate algorithms for the above problems when given inaccurate, yet well-conditioned input data with arbitrarily small inaccuracies. The paradox implies the following failure for any algorithm: For fixed dimensions and any small accuracy parameter epsilon > 0, one can choose any time T and find an input such that the runtime > T without having reached epsilon accuracy. Moreover, it is impossible to determine when the algorithm should halt to achieve an epsilon accurate solution. The largest epsilon for which this failure happens is called the Breakdown-epsilon. Typically the Breakdown-epsilon > 1/2 even for randomised algorithms when the input is bounded by one and well-conditioned. Moreover, it is impossible to design an algorithm to check whether the algorithm will fail on a given input. This is in fact strictly harder than solving the original problem. Thus, e.g. even if given an oracle for solving LP accurately, one cannot determine if a given algorithm for solving LP produces nonsense on its input.

The paradox implies two surprises: (1) In mathematics, statistics and learning, one computes non-computable problems on a daily basis. (2) In order to mathematically characterise this phenomenon, there is an intricate complexity theory for, paradoxically, non-computable problems. In particular, the key to explaining the success (and failure) of many algorithms in real-world scenarios lies in this rich classification theory, where sparsity turns out to be a key ingredient.

The above result leads to the paradoxes of deep learning: (1) One cannot guarantee the existence of algorithms for accurately training the neural network, even if there is one minimum and no local minima. Moreover, (2) one can have 100% success rate on arbitrarily many test cases, yet uncountably many misclassifications on elements that are arbitrarily close to the training set.

15/1 Luigi De Pascale

A variational limit in density functional theory

The Density Functional Theory offers a model for quantum mechanic which is alternative to the Schrodinger equation. It is a Variational Model so it is based on the minimization of a certain energy functional on a functional space. The main idea is to model everything in terms of the single electron density. The choice of the energy functional to minimize need to be adapted to the type of phenomenon which one aim to describe.

I will discuss how a particular choice of the energy term which describes the electron-electron interaction effects the description of the so called dissociating limit of a molecule (how the molecule dissociate when the nuclei are moved away from each other).

15/1 Enric Meinhardt-Llopis

Fusion of Images

Image fusion is a very general problem that consists in synthesizing an ideal image of an object from several images that are incomplete, deformed, blurry, noisy and with unknown illumination changes. We focus on two concrete applications of image fusion: (1) the virtual periscope, that allows to see through the turbulent surface of water and (2) museum photography, where we obtain a clean, complete and well-lit image of a painting from several bad and partial images of it. In most cases, the best results are achieved by using a robust combinator function: e.g. the minimizer of the p-error with p=1/2, which acts as a kind of ``soft mode''. Depending on the relative importance of noise w.r.t blur, this combinator function can be applied either in the spatial domain (to combine pixel values or their derivatives), or in the frequency domain (to combine the amplitudes of each frequency). Creating a general-purpose dual-domain combinator is the subject of ongoing research.

18/12 Marco Cuturi

Generative Models and Optimal Transport

A recent wave of contributions in machine learning center on the concept of generative models for extremely complex data such as natural images. These approaches provide principled ways to use deep network architectures, large datasets and automatic differentiation to come up with algorithms that are able to synthesize realistic images. We will present in this talk how optimal transport is gradually establishing itself as a valuable tool to carry out this estimation procedure.

18/12 Miquel Oliu Barton

Constant payoff in finite stochastic games

In any one-shot zero-sum game, the payoff induced by a couple of optimal strategies is equal to the value of the game. For dynamic games, a natural refinement is that the average payoff, after any fraction of the game, is equal to the value of the game. In this paper we prove that this is the case for patient players in any finite zero-sum stochastic games, as conjectured by Sorin, Venel and Vigeral 2010. The proof relies on the semi-algebraic structure of optimal strategies and on the asymptotic properties of Markov chains with rare transitions.

20/11 Rajendra Bhatia

The Bures- Wasserstein distance on positive definite matrices

There is an interesting metric on the space of positive definite matrices called the Bures distance in quantum information, and the Wasserstein metric in optimal transport. We will discuss its properties, and its connections with various subjects, with emphasis on the Frechet mean of two or more matrices with respect to this distance. It is instructive to compare this with the Cartan mean (the Riemannian mean, the Karcher mean) much studied in recent years. The emphasis will be on the matrix analysis perspective, leading to inequalities, and comparisons between different means.

20/11 Benedikit Wirth

(Discrete) spline interpolation on Riemannian manifolds

Spline curves represent a simple and efficient tool for data interpolation in Euclidean space. During the past decades, however, more and more applications have emerged that require interpolation in (often high-dimensional) nonlinear spaces such as Riemannian manifolds. An example is the generation of motion sequences in computer graphics, where the animated figure represents a curve in a Riemannian space of shapes. Two particularly useful spline interpolation methods derive from an optimization principle: linear splines minimize the average squared velocity and cubic splines minimize the average squared acceleration among all interpolating curves. Those variational principles and their discrete analogues can be used to define continuous and discretized spline curves on (possibly infinite-dimensional) Riemannian manifolds. However, it turns out that well-posedness of cubic splines is much more intricate on nonlinear and high-dimensional spaces and requires quite strong conditions on the underlying manifold. We will analyse and discuss linear and cubic splines as well as their discrete counterparts on Riemannian manifolds and show a few applications.

16/10 Jean-David Benamou

Dynamic formulations of Optimal transportation and variational Mean Field Games

I will review several relaxations of the classical Monge Optimal Transport problem using a dynamic “time” extension and discuss the corresponding available numerical methods. They also apply to some instances of variational mean field games.

16/10 Filippo Santambrogio

Metric methods for heteroclinc connections in finite and infinite dimension

The curves, defined on the whole R and valued in a space X (typically, R^n), which minimize an action of the form \int_R(1/2|u'|^2+W(u)), where Wis a double-well potential W:X->[0,+\infty] vanishing exactly at two points a^±, and the boundary data are exactly fixed as u(±\infty)=a^±, are called heteroclinic connections in the literature. Proving existence of such curves is a classical problem in calculus of variations, with lack of compactness due to its translation-invariance (see works by P. Sternberg, N. Alikakos, G. Fusco, P. Rabinowitz...).

In the talk I will first explain how to transform the problem into the search for geodesics for a weighted metric, i.e. curves solving \min \int k(u)|u'|, where k=(2W)^{1/2}... In practice, we look for a geodesic in the space X endowed with a weighted distance d_k. The difficulty is that k can vanish, and actually it does.

Then, I will present the main results, obtained in collaboration with Antonin Monteil (currently at UC Louvain), concerning the existence of these geodesics with increasing level of generality. The final goal is to study the case where X itself is a functional space (typically, L^2), with a potential W involving the Dirichlet energy. This allows to study elliptic PDE problems in cylinders giving to one variable the role of time. In particular, we are able to recover via a purely variational argument (and to weaken some assumptions) some classical results by S. Alama, L. Bronsard and C. Gui, and by M. Shatzman on the problem of double connections: connecting in R^2 two different heteroclinic connections.

--------

2016/17

19/6 Kristian Bredies

Accelerated and preconditioned Douglas-Rachford algorithms for the solution of variational imaging problems

(joint work with Hongpeng Sun)

We discuss basic, accelerated and preconditioned versions of the Douglas-Rachford (DR) splitting method for the solution of convex-concave saddle-point problems that arise in variational imaging. While the basic DR iteration admits weak and ergodic convergence with rate $O(1/k)$ for restricted primal-dual gaps, acceleration leads to convergence rates of $O(1/k^2)$ and $O(q^k)$ for some $0<q<1$ under appropriate strong convexity assumptions. Further, preconditioning allows to replace the potentially computationally expensive solution of a linear system in each iteration step in the corresponding DR iteration by fast approximate solvers without the need to control the error.

The methods are applied to non-smooth and convex variational imaging problems. We discuss denoising and deconvolution with $L^2$ and $L^1$ discrepancy and total variation (TV) as well as total generalized variation (TGV) penalty. Preconditioners which are specific to these problems are presented, the results of numerical experiments are shown and the benefits of the respective accelerated and preconditioned iteration schemes are discussed.

19/6 Zheng Qu

Restarting accelerated gradient methods with a rough strong convexity estimate

Proximal gradient method is known to converge linearly for convex composite minimization problems when the objective function satisfies a local quadratic growth condition, which greatly generalizes the strong convexity condition. Accelerated proximal gradient methods are known to converge with a worst-case complexity bound $O(1/\sqrt{\epsilon})$. If in addition the problem is strongly convex with condition number $\kappa$, an accelerated linear convergence rate (worst-case complexity bound $O(\sqrt{\kappa}\log(1/\epsilon))$) can be achieved. These classical methods have been successfully extended to randomized coordinate update to improve the efficiency for large scale problems.

We first show that, by using an adaptive restart strategy, the accelerated linear convergence can also be achieved under the local quadratic growth condition. Our method can be combined with any accelerated proximal gradient method with complexity bound $O(1/\sqrt{\epsilon})$. We also propose new restarting strategies for accelerated gradient and accelerated coordinate descent methods. In particular, we show that by changing the restarting point, the restarted method has a geometric rate of convergence for any restarting frequency, and so it allows us to take profit of restarting even when we do not know the local quadratic growth condition number.

22/5 Fabio D'Andreagiovanni

Multiband Robust Optimization: theory and applications

Over the last years, Robust Optimization (RO) has emerged as an effective and efficient methodology to tackle data uncertainty in real-world optimization problems. RO takes into account data uncertainty in the shape of hard constraints that restrict the feasible set and maintain only robust solutions, i.e. solutions that remain feasible even when the values of the input data change.

In this talk, I will provide an overview of my research about theory and applications of RO. Specifically, I will present Multiband Robustness (Büsing and D'Andreagiovanni 2012), a model for RO proposed to generalize and refine the classical Gamma-robustness model by Bertsimas and Sim. The main aim of the model is to provide an improved representation of arbitrary non-symmetric distributions of the uncertainty, in particular histograms, which are commonly present in real-world applications. Such refined representation grants a reduction in conservatism of robust solutions, while maintaining the accessibility and computational tractability that have been a key factor of success of Gamma-robustness. I will also provide an overview of applications of the Multiband model to real-world problems that I have considered in past and ongoing industrial research projects.

22/5 Nelson Maculan

The Euclidean Steiner tree problem in n-space: Mixed-integer nonlinear optimization models

The Euclidean Steiner tree problem (ESTP) is to find a shortest network interconnecting p given points in n-dimensional Euclidean space. The problem was first described in the plane and an algorithm with very good practical efficiency has been proposed to solve this particular case. The generalization for higher dimensions was proposed in the 19th century, however the numerical solution of the problem remains very challenging when n ≥ 3. We present a way for formulating this problem using mixed-integer nonlinear optimization models, and some computational results. A conjecture for given points associated with the vertices of a n-cube (n=2,3,4,...) is also discussed.

This is a joint work with Marcia Fampa, Virginia Costa and Luiz-Felipe R. Ribeiro da Cruz.

24/4 András Sebő

What should we buy for Traveling Salesmen, for Hypertournaments and for Reliable Networks?

Starting with a beautiful theorem of Graham and Pollak (1972) about partitioning the edge-set of a graph to complete bipartite graphs (with Tverberg's elegant short proof (1982)) applied to ''hypertournaments'', continuing with recent results on reliable networks (minimum 2-edge-connected graphs 2014-2017), and finishing with the best bound for traveling salesman paths (March 2017), we exhibit a classical structural tool occurring in novel and very useful ways.

I would like to show the tool, the way it is applied and some ongoing research using it for improving the approximation ratio of some relevant optimization (or extremal) problems.

24/4 Leo Liberti

On the approximate solution of large dense Linear Programs

A well known lemma by Johnson and Lindenstrauss states that given a finite set X of n points in a Euclidean space E(m) in m dimensions and a small real tolerance 0<t<1, there is a k ~ O(1/t^2 log(n)) and a mapping f:E(m)-->E(k) such that:

for each x,y in E(m) (1-t)||x-y|| <= ||f(x)-f(y)|| <= (1+t)||x-y||.

We apply the Johnson-Lindenstrauss lemma to the columns of a Linear Program (LP), and show that the resulting projected LP approximates the original LP quite well. We showcase the application of this methodology to solving large (and rather dense) LPs. Joint work with Pierre-Louis Poirion and Vu Khac Ky.

27/3 Victor Chepoi

Tout réseau hyperbolique a un coeur

We investigate the impact the negative curvature has on the traffic congestion in large-scale networks. We prove that every Gromov hyperbolic network G admits a core, thus answering in the positive a conjecture by

Jonckheere, Lou, Bonahon, and Baryshnikov, Internet Mathematics, 7 (2011)

which is based on the experimental observation by

Narayan and Saniee, Physical Review E, 84 (2011)

that real-world networks with small hyperbolicity have a core congestion. Namely, we prove that for every subset X of n vertices of a graph G with delta-thin geodesic triangles there exists a vertex m of G such that the ball B(m,4delta) of radius 4delta centered at m intercepts at least one half of the total flow between all pairs of vertices of X, where the flow between two vertices x,y of X is carried by geodesic (or quasi-geodesic) (x,y)-paths. We show that the point m can be constructed in the following way: take a median point m* of X (a point minimizing the sum of distances to the points of X) in the injective hull of G and then a closest to m* point in G.

Moreover, we prove a primal-dual result showing that, for any commodity graph R on X and any r at least 8delta, the size sigma_r(R) of the least r-multi-core (i.e., the number of balls of radius r) intercepting all pairs of R is upper bounded by the maximum number of pairwise (2r-5delta)-apart pairs of R and that an r-multi-core of size sigma_{r-5\delta}(R) can be computed in polynomial time. Our result about multi-cores is based on a general Helly-type theorem for quasiconvex sets in delta-hyperbolic graphs (this is our second main result). Namely, we show that for any finite collection Q of pairwise intersecting epsilon-quasiconvex sets of a delta-hyperbolic graph G there exists a single ball B(c,2epsilon+5delta) intersecting all sets of Q. Using the Helly theorem for quasiconvex sets and a primal-dual approach, we show algorithmically that the minimum number of balls of radius 2epsilon+5delta intersecting all sets of a family Q of epsilon-quasiconvex sets does not exceed the packing number of Q (maximum number of pairwise disjoint sets of Q).

27/3 Nicolas Le Roux

Optimize what you care about (and stop using the log-loss)

Machine Learning systems are increasingly complex and often comprise many interconnected components. When working on a single of these components, one should make sure that any modification to that component improves the whole system.

I will discuss how, when performing classification, the log-loss does not satisfy these guarantees. I will then present a simple modification which not only improves the classification error, but also allows the seamless integration of external constraints as well as the possibility for the system to deal with its own uncertainty.

27/2 Claudia d'Ambrosio