# Statistics and Probability seminar

*There are two types of seminars organised at the math department. Everybody is most welcome! *

* *

*"General": with UR funding received from the University we, Céline Esser, Gentiane Haesbroeck, Catherine Timmermans and myself, have decided to organise a regular seminar at ULiège. All topics in probability and statistics are concerned, although we aim to keep the talks accessible even to non specialists. *

*"Specialised": with funding I received from a Welcome Grant and FNRS I organise a regular probability/statistics seminar at ULg. The topics are more advanced and/or specialised. *

__2018-2019__

**Thursday 06/12/2018, 2:30 PM room TBC (General)**

Catherine Dehon (ULB) TBA

*Abstract:* TBA

**Thursday 29/11/2018, 2:30 PM room TBC (General)**

Anne Ruiz Gazen (Toulouse) "Invariant Coordinate Selection for outlier detection with application to quality control"

*Abstract:* Detecting outliers in multivariate data sets is of particular interest in various contexts including quality control in high standards fields such as automotive or avionics. Some classical detection methods are based on the Mahalanobis distance or on robust Principal Component Analysis (PCA). One advantage of the Mahalanobis distance is its affine invariance while PCA is only invariant under orthogonal transformations. For its part, PCA allows some components selection and facilitates the interpretation of the detected outliers. We propose an alternative in a casewise contamination context and when the number of observations is larger than the number of variables, called Invariant Coordinate Selection. Its principle is quite similar in spirit to PCA with invariant components derived from an eigendecomposition followed by a projection of the data on some selected eigenvectors. The decomposition is based on two scatter matrix estimators instead of one for PCA. While principal components are scale dependent, the invariant components are affine invariant for affine equivariant scatter matrices. Moreover, under some elliptical mixture models, the Fisher's linear discriminant subspace coincides with a subset of invariant components in the case where group identifications are unknown. The method which is implemented in the ICSOutlier and ICSShiny R packages will be illustrated on data sets from the quality control field.

**Tuesday 23/10/2018, 4:15 PM room S36 (General)**

Robert Gaunt (Manchester) "Compound Poisson Approximation of subgraph counts in Stochastic block models"

*Abstract:* Small connected subgraphs (such as edges and triangles) are important network summary statistics and play a role in many parts of network science, including analysis of biological networks and network comparison methodologies. Finding the distribution of the number of copies of such subgraphs in certain random graph models is therefore of interest. In this talk, we review classical results that concern the normal and Poisson approximation (depending on the parameter regime) of the number of copies of subgraphs in the Erdos-Renyi G(n,p) model. We then consider a generalisation from the G(n,p) model to the stochastic block model (with possibly multiple edges), and obtain Poisson and compound Poisson approximations (in different regimes) for subgraph counts in these random graph models. This is joint work with Matthew Coulson and Gesine Reinert.

__2017-2018__

**Friday 18/05/2018, 11 AM room S33 (Petit Déjeuner)**

Germain Van Bever (ULB) "Functional independent component analysis: joint diagonalization of scatter operators"

*Abstract:* With the increase in measurement precision, functional data is becoming common practice. Relatively few techniques for analysing such data have been developed, however, and a first step often consists in reducing the dimension via Functional PCA, which amounts to diagonalising the covariance operator. Joint diagonalisation of a *pair* of scatter functionals has proved useful in many different setups, such has Independent Component Analysis (ICA), Invariant Coordinate Selection (ICS), etc. After an introduction to classical ICA techniques, the main part of this talk will consist in extending the Fourth Order Blind Identification procedure to the case of data on a separable Hilbert space (with classical FDA setting being the go-to example). In the .nite-dimensional setup, this procedure provides a matrix G such that GX has independent components, if one assumes that the random vector X satis.fies X = Psi Z, where Z has independent marginals and Psi is an invertible mixing matrix. When dealing with distributions on Hilbert spaces, two major problems arise: (i) the notion of marginals is not naturally de.fined and (ii) the covariance operator is, in general, non invertible. These limitations are tackled by reformulating the problem in a coordinate-free manner and by imposing natural restrictions on the mixing model. The proposed procedure is shown to be Fisher consistent and a.ne invariant. A sample estimator is provided and its convergence rates are derived. The procedure is amply illustrated on simulated and real datasets. Joint work with B. Li, H. Oja, R. Sabolova and F. Critchley.

**Monday 19/03/2018, 11 AM room S33 (Petit Déjeuner)**

Pierre Artoisenet (Deloitte Belgium) "Modélisation des fuites dans un réseau de distribution d'eau"

*Abstract :* L'objet de ce séminaire porte sur l’estimation et la prédiction des risques de fuites dans les conduites d’eau d’un réseau de distribution. Sur base d’un historique de fuites de 10 ans, l’approche proposée consiste à modéliser les séquences de fuites observées sur les conduites monitorées par des processus de Poisson non homogènes. Une attention particulière est portée sur l’adéquation entre les approximations inhérentes au modèle et les données collectées.

**Wednesday 07/02/2018, 11 AM room S39 (Petit Déjeuner)**

Elvezio Ronchetti (Research Center for Statistics and Geneva School of Economics and Management University of Geneva Switzerland) "Robust Filtering"

*Abstract: *Robust statistics deals with deviations from ideal models and develops statistical procedures which are still reliable and reasonably efficient in a small neighborhood of the model. We first review some fundamental ideas developed in robust statistics which can be used to construct robust statistical procedures in fairly general settings. We then adapt these ideas to filtering methods, which are powerful tools to estimate the hidden state of a state-space model, by defining a concept of robustness for a filter and by proposing robust filters which provide accurate state and parameter inference in the presence of model misspecifications. Joint work with L. Calvet and V. Czellar.

**Thursday 21/12/2017 at 11AM (Petit Déjeuner)**

Christophe Croux (Edhec Business school Lille) "Sparse Multi-Class Estimators"

*Abstract: *The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Our interest lies in joint, multi-class estimation of several VAR models. Assume we have K VAR models for K distinct but related classes. We jointly estimate these K VAR models to borrow strength across classes and to estimate multiple models that share certain characteristics. Our methodology encourages corresponding effects to be similar across classes, while still allowing for small differences between them. Moreover, we focus on multi-class estimation of high-dimensional VAR models, i.e. models with a large number of parameters relative to the time series length. Therefore, our estimate is sparse: unimportant effects are estimated as exactly zero, which facilitates the interpretation of the results. We consider a marketing application and a commodity application of the proposed methodology. This talk is based on joint work with Luca Barbaglia (KU Leuven) and Ines Wilms (Cornell University)

**Wednesday 20/12/2017 at 11AM room 0.39 (Specialised)**

Kevin Tanguy (Angers) "Superconcentration et Transport Optimal"

*Résumé :* cet exposé débutera par une introduction au phénomène de superconcentration. Puis, en rappelant quelques résultats basiques de transport optimal en dimension 1, nous montrerons comment le transport d'inégalités fonctionnelles permet d'obtenir des résultats pertinents de superconcentration.

L'accent sera mis sur l'obtention d'inégalité de Poincaré à poids pour des mesures log-concave, permettant d'obtenir des bornes (non-asymptotiques) pour différentes fonctionnelles (maximum, median, normes l^p, plus grande particule de gaz coulombien,...). La loi gaussienne standard sur $\R^n$ servira d'exemple majeur. Nous évoquerons également les arguments nécessaires pour obtenir des inégalités de déviations pertinentes dans le cadre de la théorie des extrêmes (domaine d'attraction de la loi de Gumbel). Si le temps le permet, nous esquisserons de quelle manière le transport d'inégalités isopérimétriques permet d'obtenir d'autres inégalités de déviations.

**Thursday 30/11/2017 at 11AM (Petit Déjeuner)**

Davy Paindaveine (ULB) "BadmintonMeetsProbability®" - slides

*Abstract:* In this talk, we show how probability theory can help modelling badminton and similar two-person sports. Throughout, we focus on the classical "i.i.d." model, yet we consider both cases where serving does or does not affect the probability to win a rally. We discuss how realistic model assumptions are. We show how probability theory allows (a) to compute the probability of match outcomes and (b) to describe match durations. Our main application is on evaluating how much such sports can be impacted by a change in the scoring system. This work gives us the opportunity to illustrate the usefulness - but also the limitations! - of computer simulations. Hopefully, it also shows that two-person sports may help teaching simple concepts including independence and conditional probability, as well as more involved ones such as conditional expectations and Markov chains.

**Thursday 09/11/2017 at 11AM (Petit Déjeuner)**

Emilie Devijver (CNRS, Grenoble) "Block-diagonal covariance selection for high-dimensional Gaussian graphical models" - slides

*Abstract: *This talk is about network inference in high-dimension. To reduce the number of parameters to estimate in the model, we propose a non-asymptotic model selection procedure. The covariance matrix of the model is approximated by a block-diagonal matrix (up to permutations), the structure being detected by thresholding the sample covariance matrix. The threshold is selected using the slope heuristic. The method is supported by theoretical guarantees based on an oracle type inequality and a minimax lower bound. The performance of the procedure is illustrated on a real gene expression dataset with a limited sample size: the dimension reduction leads to a more parsimonious and interpretable modular network. The stability of the method is discussed, as well as an extension for predicting a quantitative trait from transcriptomic data. This is a joint work with Mélina Gallopin, based on http://www.tandfonline.com/doi/full/10.1080/01621459.2016.1247002

**Thursday 12/10/2017 at 11AM (Specialised)**

Mikolaj Kasprzak (Oxford) "Diffusion approximations via Stein's method and time changes" - slides

*Abstract: *We use Stein's method to obtain a bound on the distance between a scaled time-changed random walk and a time-changed Brownian Motion. We then apply this result to bound the distance between a time-changed compensated scaled Poisson process and a time-changed Brownian Motion. This allows us to bound the distance between a process whose dynamics resemble those of the Moran model with mutation and a process whose dynamics resemble those of the Wright-Fisher diffusion with mutation upon noting that the former may be expressed as a difference of two time-changed Poisson processes and the diffusive part of the latter may be expressed as a time-changed Brownian Motion. The method is applicable to a much wider class of examples satisfying the Stroock-Varadhan theory of diffusion approximation.

__2016-2017__

**Monday 20/03/2017 at 11AM room S.33** **(Petit Déjeuner)**

Nathan Uyttendaele (ISBA, UCL) "High-dimensional dependence modeling using copulas"

*Abstract*: Copulas have been introduced more than half a century ago and represent a significant breakthrough in the study of dependencies between random variables, as they allow to do so free of any concern for the univariate margins, which, by definition, have nothing to do with the way the random variables interact with one another. However, while the framework for copulas is well established, the problem of finding actual copulas remains. The development of bivariate copulas (d=2) toke off during the last decades, but satisfying multidimensional copulas (d>2) are still lacking, with the possible exception of vine copulas. In this gentle presentation designed for non-specialists, I will explain what a copula is and showcase some of the results from my thesis. The presentation will be given in French and concluded by a short, 5 min long, animated movie from the YouTube channel "La statistique expliquée à mon chat".

**Monday 06/03/2017 at 11AM room S.33** **(Petit Déjeuner)**

Ines Wilms (KU Leuven) "Sparse cointegration"

*Abstract*: Cointegration analysis is used to estimate the long-run equilibrium relations between several time series. The coefficients of these long-run equilibrium relations are the cointegrating vectors. We provide a sparse estimator of the cointegrating vectors. Sparsity means that some elements of the cointegrating vectors are estimated as exactly zero. The sparse estimator is applicable in high-dimensional settings, where the time series length is short relative to the number of time series. Our method achieves better estimation accuracy than the traditional Johansen method in sparse and/or high-dimensional settings. We use the sparse method for interest rate growth forecasting and consumption growth forecasting. We show that forecast performance can be improved by sparsely estimating the cointegrating vectors. Joint work with Christophe Croux (KU Leuven).

**Monday 06/02/2017 at 11AM room S.33** **(Petit Déjeuner)**

Christophe Ley (UGent) "It's all about that Bayes: a novel approach to build (non-)informative priors for shape parameters"

*Abstract: *In several families of distributions, shape parameters control more than one aspect of the distribution. A good example is the shape/skewness parameter of skew-symmetric distributions: besides the skewness, it impacts on the mode, spread and tail behavior of the densities. Consequently, building meaningful (non-)informative priors on that parameter is a hard task. In this talk, I shall present a new, natural way of building such priors, by invoking the very nature of a "shape" parameter: modifying the shape of a given base distribution (in the case of a skew-symmetric distribution, the base distribution is the symmetric density that is skewed by introducing the skewness parameter). We shall measure this shape-modification by means of the Total Variation distance between the base distribution and the distribution with shape parameter, and set a prior density on this understandable distance. This is joint work with Holger Dette (Ruhr-Universität Bochum) and Javier F. Rubio (London School of Hygiene and Tropical Medicine).

**Tuesday 22/11/2016 at 4PM room 1/64 (Specialised)**

Guillaume Carlier (Université Paris Dauphine) "Towards a central limit theorem in the Wasserstein space"

*Abstract*: In this talk, I will first recall the notion of Wasserstein barycenter between probability measures and some of its properties. Then I will formulate a CLT for these objects and prove it in some particular cases as well as for some entropy regularized version of the problem.

**Tuesday 25/10/2016 at 4PM room 0/55 (Specialised)**

Marc Hallin (ULB) "Optimal transport for multivariate quantiles"

*Abstract*: We propose new concepts of statistical depth, multivariate quantiles, ranks and signs, based on canonical transportation maps between a distribution of interest on Rd and a reference distribution on the d-dimensional unit ball. The new depth concept, called Monge-Kantorovich depth, specializes to halfspace depth in the case of spherical distributions, but, for more general distributions, differs from the latter in the ability for its contours to account for non convex features of the distribution of interest. We propose empirical counterparts to the population versions of those Monge-Kantorovich depth contours, quantiles, ranks and signs, and show their consistency by establishing a uniform convergence property for empirical transport maps, which is of independent interest.

**Thursday 20/10/2016 at 12PM room 0/55 (Specialised)**

Wen Yue (TU Wien) "Discrete Beckner inequalities via the Bochner-Bakry-Emery approach for Markov chains"

*Abstract*: Discrete convex Sobolev inequalities and Beckner inequalities are derived for time-continuous Markov chains on finite state spaces. Beckner inequalities interpolate between the modified logarithmic Sobolev inequality and the Poincar´e inequality. Their proof is based on the Bakry-Emery approach and on discrete Bochner-type inequalities established by Caputo, Dai Pra, and Posta and recently extended by Fathi and Maas for logarithmic entropies. The abstract result for convex entropies is applied to several Markov chains, including birth-death processes, zero-range processes, Bernoulli-Laplace models, and random transposition models, and to a finite-volume discretization of a onedimensional Fokker-Planck equation, applying results by Mielke.

**Thursday 29/09/2016 at 12PM room S.36 (Specialised)**

Lester Mackey (Stanford) "Measuring Sample Quality with Stein's Method"

*Abstract*: To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.

__2015-2016__

**Wednesday 2/03/2016 at 3PM room S.36 (Specialised)**

Guillaume Poly (Rennes) "Développement d'Edgeworth dans le TCL pour des lois et métriques singulières"

*Résumé*: Le développement d'Edgeworth est un outil fréquemment utilisé en statistique car il permet d'affiner certaines estimations dues au théorème central limite. Hélas, les hypothèses qui sont utilisées dans la littérature ne permettent pas d'utiliser cet outil pour des lois discrètes car elles reposent sur la condition de "non-arithméticité" de Cràmer qui n'est malheureusement pas valide pour des lois discrètes. Dans cet exposé, nous pallierons à ce défaut majeur. Nous introduisons une nouvelle condition de type Cràmer qui sera valide pour de nombreuses lois discrètes et permettra néanmoins d'employer les développements d'Edgeworth. On étudiera quelques applications à des problèmes de racines de polynômes aléatoires.

**Thursday 4/02/2016 (Specialised)**

Robert Gaunt (Oxford) "Rates of convergence for multivariate normal approximations by Stein's method"

*Abstract:* Stein's method is a powerful technique for obtaining distributional approximations in probability theory. We begin by reviewing Stein’s method for normal approximation. We then consider how this approach can be adapted to limits other than the normal. In particular, we see how Stein’s method for normal approximation can be extended relatively easily to the approximation of statistics that are asymptotically distributed as functions of multivariate normal random variables. We obtain some general bounds and a surprising result regarding the rate of convergence. We end with an application to the rate of convergence of Pearson's chi-square statistic. Part of this talk is based on joint work with Alastair Pickett and Gesine Reinert.

**Tuesday 1/12/2015 (Specialised)**

Thomas Bruss (ULB) "Too much versus too little information in problems of Optimal stopping"

*Abstract* : Many important decisions in life are sequential, simply because life itself is sequential. Optimal Stopping is that domain in Probability Theory which deals with models and solutions for sequential decision processes. It seems intuitive that

the more solid information one would have in decision problems the easier it should be to make optimal decisions for a given objective. However, there is no such easy ordering as we will exemplify by two extreme cases, one being the celebrated Robbins’ Problem as e.g. in [2], and the other one the so-called Last-arrival problem [3]. Our excursion over several problems in between these two extremes, including [1] and [4], will not only display mathematical subtleties and challenging open problems but also produce several take-home messages which have a certain appeal for real-life situations.

References:

[1] F. Thomas Bruss and L.C.G. Rogers (1991), Embedding optimal selection problems in a Poisson process, Stoch. Proc. And Their Applic. ,Vol. 38, Issue 2, 267-278.

[2] F. Thomas Bruss and Yvik Swan, (2009), A continuous-time approach to Robbins' problem of minimizing the expected rank, J. Appl. Probab., Vol. 46, Nr.1, 1-18.

[3] F. Thomas Bruss and Marc Yor (2012), Stochastic processes with proportional increments and the last-arrival problem, Stoch. Proc. And Their Applic., Vol. 122, Issue 9, 3239-3261.

[4] Rémi Dendievel (2015), Weber’s optimal stopping problem and generalizations, Statistics & Probability Letters, Vol. 97, 176–184."

**Tuesday 3/11/2015, 2PM (Specialised)**

Thomas Bonis (INRIA) "Stable measure and Stein's method. "

*Abstract* : In this talk I present a new way to apply Stein's method in order to derive bounds on the Wasserstein distance between two (possibly discrete) measures under appropriate conditions. Using this approach one can provide rates of convergence in the Central Limit Theorem and show the convergence of invariant measures in the diffusion approximation framework. I will then briefly present two possible applications of this last result, the first one concerns guarantees for an approximate Monte Carlo sampling algorithm for log-concave distributions while the second one deals with density estimation on k-nearest neighbor random graphs.

**Tuesday 6/10/2015, 2PM (Specialised)**

Hermann Thorisson (University of Iceland) "Mass-Stationarity and Brownian Motion" - [pdf]

*Abstract* : Mass-stationarity means that the origin is at a typical location in the mass of a random measure. For a simple example, consider the stationary Poisson process on the line conditioned on having a point at the origin. The origin is then at a typical point (at a typical location in the mass) because shifting the origin to the n:th point on the right (or on the left) does not alter the fact that the inter-point distances are i.i.d. exponential. Another (less obvious) example is the local time at zero of a two-sided standard Brownian motion. In this talk we will concentrate on mass-stationarity on the line with Brownian motion as the main example. If time allows we will briefly extend the view beyond the line, moving through the Poisson process in the plane towards general random measures on groups.