Schedule

The talks take place in the amphitheater 2E07 on the town center campus of Avignon University

Wednesday, June 1

Arrival and welcome coffee: 8:30–9:15

9:15–9:30

JSS Opening

9:30–10:30

Talk 1 : Gersende Fort

Title: Stochastic Majorization-Minimization algorithms for large scale learning

Abstract: Optimization of an untractable objective function is a computational problem that large scale learning has to face with: for example, the objective function is an empirical loss computed on a very large number of examples, or an expected loss in the context of online learning. The numerical solution can not (or very rarely in the first case) evaluate the function nor its gradient when regular enough.

In this talk, I will consider a class of problems that can be solved by a Majorization-Minimization (MM) approach; these methods include famous algorithms in Computational Learning such as the gradient-based algorithms and the Expectation-Maximization algorithm. Novel stochastic versions of MM will be introduced, which allow an online processing of the data and/or an acceptable computational cost in the case of a large batch of examples.

We will also derive new stochastic MM algorithms for the Federated Learning setting, when the data are owned by many local agents and never centralized, the learning task is solved by a central server, and the communication cost between local agents and the central server has to be small.

Finally, explicit convergence bounds will be given; these bounds provide a complexity analysis of the algorithms and insights on how to fix some design parameters of the methods.

This talk is based on joint works with Aymeric Dieuleveut (CMAP, Ecole Polytechnique), Eric Moulines (CMAP, Ecole Polytechnique), Geneviève Robin (LaMME, CNRS) and Hoi-To Wai (Chinese University of Hong-Kong).

Coffee break: 10:30 - 11:00

11:00–12:30

Mini-course by Patricia Reynaud-Bouret (Part 1)

Title: Biological neural networks : coding ability, connectivity and simulation

Abstract: Biological neural networks are an amazing structure. They exchange information as very sparse point processes and their energy consumption is minimal. Nevertheless, they can encode so many stimuli or behaviors ! They can also learn and keep memories for decades... Statistics and simulations can help us to understand a bit more what's going on in this amazing system, even if we just barely scratch the surface.

Patricia_Reynaud_Bouret_cours.pdf

Lunch: 12:30–14:00

14:00–15:00

Talk 2 : Céline Duval

Title: Interacting Hawkes processes with multiplicative inhibition

Abstract: After a short introduction on Hawkes processes, we introduce a general class of mean-field interacting nonlinear Hawkes processes modelling the reciprocal interactions between two neuronal populations, one excitatory and one inhibitory. The model incorporates two features: inhibition, which acts as a multiplicative factor onto the intensity of the excitatory population and additive retroaction from the excitatory neurons onto the inhibitory ones. We detail the well-posedness of this interacting system as well as its dynamics in large population. The analysis of the longtime behavior of the mean-field limit process can be explicated. We illustrate numerically that inhibition and retroaction may be responsible for the emergence of limit cycles. (j.w. with E. Luçon and C. Pouzat)

Celine_Duval.pdf

Coffee break: 15:00–15:30

15:30–16:30

Talk 3 : Matthieu Lerasle

Title: Some phase transition phenomena in graphical data analysis.

Abstract: I’ll present two problems where data naturally present a graphical structure: the analysis of champions in a tournament and the problem of matching. I’ll present for each problem intuitive results in toy models and discuss various mathematical tools involved to prove them. I’ll also present many open problems, hopefully convincing people to jump in this growing area.

16:30–17:30

Talk 4 : Antoine Usseglio-Carleve

Title : Extreme expectiles estimation: regression, bias reduction and inference

Abstract : In this talk, I firstly introduce the notion of expectile, considered in the literature as an alternative to the classical quantile (which suffers from some theoretical pitfalls). Then, I propose some methods to estimate extreme quantiles and expectiles, i.e. in the challenging case when the quantile/expectile level is close to 1. Indeed, if estimating the median (quantile of level 0.5) or a quartile (quantile of level 0.25 or 0.75) of a random variable Y is obvious when we have a sample of size n, what happens if the quantile level exceeds 1-1/n ? In such a case, the use of the classical order statistic systematically returns the maximum of the sample, and thus leads to a non-consistent estimation of the quantile (the same problem occurs for expectiles). I thus introduce some extrapolated quantile/expectile estimators to overcome this issue, and provide their asymptotic normality.

Through a simulation study, we realize that our extreme expectile estimators suffer from a huge bias and a very poor inference. I thus propose some bias-corrected estimators, and accurate confidence intervals which widely improve the coverage probabilities.

Finally, I consider the regression case, i.e. when a covariable vector X is recorded alongside Y, in the i.i.d. setting and under mixing conditions. Some applications in insurance, finance and environment are also proposed.

Antoine_UC.pdf

Thursday, June 2

9:00–10:00

Talk 5 : Adeline Leclercq-Samson

Title: Estimation methods for biological stochastic differential equations

Abstract: I will present an overview of a class of stochastic differential equations, used in biology and ecology. These models can be multidimensional, hypoelliptic (with a degenerate noise) and partially observed. I will discuss the question of parameter estimation when only discrete observations are available. Many estimation methods are based on discretization schemes. I will present the advantages of some of them (Euler, Local Linearization, Splitting) and develop the corresponding estimation methods.

Coffee break: 10:00 - 10:30

10:30–12:00

Mini-course by Patricia Reynaud-Bouret (Part 2)

Title: Biological neural networks : coding ability, connectivity and simulation

Abstract: Biological neural networks are an amazing structure. They exchange information as very sparse point processes and their energy consumption is minimal. Nevertheless, they can encode so many stimuli or behaviors ! They can also learn and keep memories for decades... Statistics and simulations can help us to understand a bit more what's going on in this amazing system, even if we just barely scratch the surface.

Lunch: 12:00–14:00

14:00–15:30

Mini-course by Olivier Wintenberger (Part 1)

Title : Extreme values for time series

Abstract: Modern extreme value theory is intimately connected to the concept of regular variation: Univariate regularly varying functions describe the domain of attraction of the Fréchet and Weibull max-domains of attraction of the Fisher-Tippett Theorem, multivariate regular variation describes the extremal dependence that we see in random vectors and regularly varying time series provide a convenient framework for the analysis of extremal dependence over time. This course provides an introduction into the basics of regular variation and more recent developments, the latter in particular with a view towards models for (financial) time series. Its main objective is to analyse extreme clustering due to temporal dependence. After defining the fundamental concepts of spectral tail and cluster processes, we describe simple statistical tools such as the extremogram and the extremal index.

Olivier_Winterberger.pdf

Coffee break: 15:30 - 16:00

16:00–17:00

Talk 6 : Sylvain Arlot

Title: Precise analysis of some Purely Random Forests

Abstract: Random forests (Breiman, 2001) are a very effective and commonly used statistical method, but their full theoretical analysis is still an open problem. As a first step, simplified models such as purely random forests have been introduced, in order to shed light on the good performance of Breiman's random forests.

In the regression framework, the quadratic risk of a purely random forest can be written (approximately) as the sum of two terms, which can be understood as an approximation error and an estimation error. In this talk, we study how each of these terms depends on the size of each tree and on the number of trees in the forest.

Precise theoretical results are obtained for a toy model. On the one hand, if the regression function is smooth enough, the approximation error of an infinite forest decreases at a faster rate (with respect to the size of each tree) than the one of a single tree. On the other hand, the estimation error is of the same order of magnitude for a single tree and for an infinite forest. As a consequence, infinite forests attain a strictly better risk rate (with respect to the sample size) than single trees, because of their better approximation properties.

These results on the approximation error and on the risk can be generalized to other purely random forests. Numerical experiments on some purely random forests close to Breiman's algorithm (hold-out random forests) suggest that they behave similarly, which sheds light on how Breiman's random forests work.

This talk is based on joint works with Robin Genuer.

References: http://arxiv.org/abs/1407.3939v2, http://arxiv.org/abs/1604.01515

Sylvain_Arlot.pdf

Friday, June 3

9:00–10:30

Mini-course by Olivier Wintenberger (Part 2)

Title : Extreme values for time series

Abstract: Modern extreme value theory is intimately connected to the concept of regular variation: Univariate regularly varying functions describe the domain of attraction of the Fréchet and Weibull max-domains of attraction of the Fisher-Tippett Theorem, multivariate regular variation describes the extremal dependence that we see in random vectors and regularly varying time series provide a convenient framework for the analysis of extremal dependence over time. This course provides an introduction into the basics of regular variation and more recent developments, the latter in particular with a view towards models for (financial) time series. Its main objective is to analyse extreme clustering due to temporal dependence. After defining the fundamental concepts of spectral tail and cluster processes, we describe simple statistical tools such as the extremogram and the extremal index.

Coffee break: 10:30 - 11:00

11:00–12:00

Talk 7 : Thibaut Le Gouic

Title: Sampler for the Wasserstein barycenter

Abstract: Wasserstein barycenters have become a central object in applied optimal transport as a tool to summarize complex objects that can be represented as distributions. Such objects include posterior distributions in Bayesian statistics, functions in functional data analysis and images in graphics. In a nutshell a Wasserstein barycenter is a probability distribution that provides a compelling summary of a finite set of input distributions. While the question of computing Wasserstein barycenters has received significant attention, this talk focuses on a new and important question: sampling from a barycenter given a natural query access to the input distribution. We describe a new methodology built on the theory of Gradient flows over Wasserstein space.

This is joint work with Chiheb Daaloul, Magali Tournus and Jacques Liandrat.

Lunch: 12:00–14:00

End of JSS