Abstract: The talk will describe a novel approach to mixture modelling for asymptotically dependent sample extremes, based on a tilting transformation suggested by Coles and Tawn, the value of which seems to have been under-appreciated. The approach is Bayesian and allows the soft clustering of extreme events, fitted by a reversible jump Markov chain algorithm. The method is applied to sectorial data from the S&P 500 index.
Abstract: For random vectors with light-tailed densities whose level sets asymptotically have the same shape, we derive explicit asymptotic expressions for tail probabilities and moments of conditional excesses over a limiting threshold for homogeneous functionals of the coordinates, which we refer to as risk variables. These expressions depend on the shape of the level sets at an extreme point as well as that of the risk region induced by a given risk variable. On the basis of these asymptotic results, we construct estimators of probabilities of risk regions and conditional excesses and assess their performance in finite sample situations through a simulation study.
Abstract: The last decade has seen numerous record-shattering heatwaves in all corners of the globe. In the aftermath of these devastating events, there is interest in identifying worst-case thresholds or upper bounds that quantify just how hot temperatures can become. Generalized Extreme Value theory provides a data-driven estimate of extreme thresholds; however, upper bounds may be exceeded by future events, which undermines attribution and planning for heatwave impacts. Here, we show how the occurrence and relative probability of observed yet unprecedented events that exceed a priori upper bound estimates, so-called “impossible” temperatures, has changed over time. We find that many unprecedented events are actually within data-driven upper bounds, but only when using modern spatial statistical methods. Furthermore, there are clear connections between anthropogenic forcing and the “impossibility” of the most extreme temperatures. Robust understanding of heatwave thresholds provides critical information about future record-breaking events and how their extremity relates to historical measurements.
Abstract: We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models, which defines population-level clusters based on the independence of the maxima of a multivariate stationary mixing random process among clusters. We also present an algorithm depending on a tuning parameter that recovers the clusters of variables without specifying the number of clusters a priori. Our work provides some theoretical insights into the consistency of our algorithm, demonstrating that under certain conditions it can effectively identify clusters in the data with a computational complexity that is polynomial in the dimension. To further illustrate the significance of our work, we applied our method to environmental real-datasets. In this specific study, we intend to identify subregions that display asymptotic independence concerning compound precipitation and wind speed extremes. To achieve this, we utilise daily precipitation sums and daily maximum wind speed data derived from the ERA5 reanalysis dataset spanning from 1979 to 2022. Our approach hinges on a tuning parameter and the application of a divergence measure to spotlight disparities in extremal dependence structures without relying on specific parametric assumptions. This enables us to generate clusters that are spatially concentrated, which can provide more insightful information about the regional distribution of compound precipitation and wind speed extremes.
Related papers (available online):
A. Boulin, E. Di Bernardino, T. Laloe, G. Toulemonde "High-dimensional clustering of sub-asymptotic maxima of a weakly dependent process" (2025), Journal of American Statistical Association (JASA), to appear.
A. Boulin, E. Di Bernardino, T. Laloe, G. Toulemonde "Identifying regions of concomitant compound precipitation and wind speed extremes over Europe" (2025), Journal of the Royal Statistical Society: Series C, to appear.
Abstract: Geometric representations for multivariate extremes, derived from the shapes of scaled sample clouds and their so-called limit sets, are becoming an increasingly popular modelling tool. Recent work has shown that limit sets connect several existing extremal dependence concepts and offer a high degree of practical utility for inference of multivariate extremes. However, existing geometric approaches are limited to low-dimensional settings, and some of these techniques make strong assumptions about the form of the limit set.
In this talk, we introduce DeepGauge - the first deep learning approach for limit set estimation. By leveraging the predictive power and computational scalability of neural networks, we construct asymptotically-justified yet highly flexible semi-parametric models for extremal dependence. Unlike existing techniques, DeepGauge can be applied in high-dimensional settings and requires few assumptions. Moreover, we also introduce a range of novel theoretical results pertaining to the geometric framework and our limit set estimator. We showcase the efficacy of our deep approach by modelling the complex extremal dependence between metocean variables sampled from the North Sea.
Abstract: The study of the statistical and dynamical characteristics of extreme and very extreme events in the climate system is impaired by a strong under-sampling issue. Because extreme events are rare, answering questions about the physical mechanisms from which they arise usually depends on the investigation of just a few cases, either in observations or in models. Here I use a rare events algorithm to massively increase the number of extremely hot, dry and anticyclonic summers in Western Europe simulated in the state-of-the-art IPSL-CM6A-LR climate model under pre-industrial anthropogenic forcings. This allows to reach precise climatological results on the dynamics leading to centennial hot summers.
Abstract: Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data.
Joint work with: Felix Reinbott, Otto-von-Guericke University Magdeburg.
Abstract: Principal Component Analysis (PCA) is widely used for dimension reduction due to its efficient implementation, easy interpretability, and natural correspondence to multivariate normal distributions. However, because extremal observations deviate significantly from normality, traditional PCA methods must be adapted for application to maxima and exceedances. In this talk, we present an approach to dimension reduction that adapts the fundamental concepts of PCA to max-stable distributions, discussing both its advantages and limitations. Our method is motivated by the representation of PCA as the solution to a regression problem, which involves finding the optimal sets of coefficients for (a) projecting a random vector onto a lower-dimensional subspace and (b) reconstructing the original observations from this lower-dimensional representation. To adapt this idea to the framework of max-stable distributions, we operate within the max-times algebra and minimize the spectral distance between the original max-stable observations and their reconstructions. By examining models that allow for perfect low-dimensional reconstruction and applying our algorithm to real and simulated datasets, we gain insights into the mechanisms underlying this approach.
Abstract: Neural Bayes estimators are neural networks that approximate Bayes estimators. They are thus likelihood-free, extremely fast to evaluate, and amenable to rapid uncertainty quantification, while also (approximately) inheriting the appealing large-sample properties of Bayes estimators. Neural Bayes estimators are therefore ideal to use with spatial extremes models observed in high dimensions, where estimation is often a computational bottleneck. In this seminar, I will summarize our research progress in that area and explain how, for any spatial model that can be simulated from, a single neural Bayes estimator can be trained to make fast inference with new data that involve varying sample sizes, varying spatial configurations of observed locations, and varying censoring levels used in peaks-over-threshold modeling. This methodology will be illustrated by application to sea surface temperature extremes over the Red Sea, and air pollution extremes over the whole Arabic peninsula.
Joint work with: Anja Janßen (Otto von Guericke Universität Magdeburg)
Abstract: When dealing with extreme observations of large dimensions, the scarcity of relevant observations in combination with model uncertainty calls for simple but flexible non-parametric learning algorithms for extremal dependence. Recently, clustering algorithms have been engaged in various ways in this task, and the combination of the multivariate regular variation framework with established algorithms to work on the (estimated) spectral measure has led to some effective tools. In this talk, we compare three algorithms in the literature: i) spherical K-means clustering (Janssen and Wan, 2020); ii) spherical K-principal-components clustering (Fomichov and Ivanov, 2022); iii) spectral clustering (Avella Medina et al., 2023). We discuss their respective advantages, theoretical results, and implementations in data applications.
Joint work with: Yi He (University of Amsterdam)
Abstract: We extend extreme value statistics to the general setting of independent data with possibly very different distributions, whereby the extreme value index of the average distribution can be negative, zero, or positive. We present novel asymptotic theory for the moment estimator, based on a uniform central limit theorem for the underlying weighted tail empirical process. We find that, due to the heterogeneity of the data, the asymptotic variance of the moment estimator can be much smaller than that in the i.i.d. case. We also unravel the improved performance of high quantile and endpoint estimators in this setup. In case of a heavy tail, we ameliorate the Hill estimator by taking an optimal convex combination of the Hill and the moment estimator. Simulations show the good finite-sample behavior of our limit results. Finally we present applications to the maximal lifespan of monozygotic twins and to the tail heaviness of energies of earthquakes around the globe.
Joint work with: Max Thannheimer
Abstract:
In order to describe the extremal behaviour of some stochastic process X, approaches from univariate extreme value theory such as the peaks-over-threshold approach are commonly generalized to the spatial or spatio-temporal domain. In this setting, extreme events can be flexibly defined as exceedances of a risk functional r like a spatial average, for example, applied to X. Inference for the resulting limit process, the so-called r-Pareto process, requires the evaluation of r(X) and thus the knowledge of the whole process X. In practical applications, we face the challenge that observations of X are only available at single sites.
To overcome this issue, we propose a two-step MCMC algorithm in a Bayesian framework. In a first step, we sample from X conditionally on the observations in order to evaluate which observations lead to r-exceedances. In a second step, we use these exceedances to sample from the posterior distribution of the parameters of the limiting r-Pareto process. Alternating these steps results in a full Bayesian model for the extremes of X. We show that, under appropriate assumptions, the probability of classifying an observation as r-exceedance in the first step converges to the desired probability. Furthermore, given the first step, the distribution of the Markov chain constructed in the second step converges to the posterior distribution of interest. Our procedure is compared to the Bayesian version of the standard procedure in a simulation study.
Joint work with: Harriet Spearing (Shell), David Irons (ATASS Sports) and Tim Paulden (ATASS Sports)
Abstract: Longitudinal data consist of a large number of short time series (on different subjects), which are typically irregularly and non-simultaneously sampled, yet have some commonality in the structure of each series and exhibit independence between time series across subjects. The talk will present extreme value models for analysing observations in the tails of such data. We require a model to describe (i) the global marginal tail using a GPD, (ii) the variation between the different time series, (iii) the global and subject specific changes in distribution over time, and (iv) the temporal dependence within each subject’s series. The methodology has the flexibility to capture both asymptotic dependence and asymptotic independence. The methodology is illustrated through the analysis of data from elite swimmers in the men’s 100m breaststroke. Unlike previous analyses of personal-best data, we are able to make inference about the careers, and future performances, of individual swimmers.
Joint work with: Tiandong Wang, Fudan University
Abstract: Preferential attachment models of network growth are bivariate heavy tailed models for in- and out-degree with limit measures which either concentrate on a ray of positive slope from the origin or on all of the positive quadrant depending on whether the model includes reciprocity or not. Concentration on the ray is called full dependence. If there were a reliable way to distinguish full dependence from not-full, we would have guidance about which model to choose. This motivates investigating tests that distinguish between (i) full dependence; (ii) strong dependence (limit measure concentrates on a proper subcone of the positive quadrant); (iii) concentration on positive quadrant. We give two test statistics and discuss their asymptotically normal behavior under full and not-full dependence. Time permitting, we discuss data examples.
Joint work with: Jeongjin Lee (Université de Namur), Johan Segers (Université catholique de Louvain)
Abstract: Regular vines are a way to organize the variables in a random vector along a sequence of trees. The first tree corresponds to a Markov random field whereas the other trees capture higher-order effects. Pair copula constructions based on vines have become very popular in dependence modelling because they allow arbitrary bivariate copulas to be combined into flexible high-dimensional distributions. Both for simulation and inference, computations are typically performed by recursive algorithms. In this project, we explore the opportunities of vine decompositions for the density of the exponent measure of a multivariate max-stable distribution. The homogeneity property that such densities satisfy leads to some simplifications in comparison to the copula case. The decomposition also sheds new light on existing parametric models and facilitates the construction of new ones.
Joint work with: Thomas Kneib
Abstract: When predicting extreme events and assessing risk, the evaluation of the forecasts is complicated by a lack of substantial observation set due to the rarity of the outcome of interest. In addition, we are often interested in evaluating specific aspects of the forecasts, such as how well certain tail properties or functionals of the predictive distribution match those of the true data distribution. These types of questions arise in various applications from finance to environmental sciences. In this talk, we will discuss approaches to forecast evaluation in this setting within the frameworks of proper scoring rules and consistent scoring functions.
Joint work with: Ilya Molchanov and Hrvoje Planinić
Abstract: Thanks to the work of many people, and the EVT community in particular, we now know a great deal about the extremes of stationary sequences and continuous time processes. Stochastic geometry is another area that presents a myriad of challenging and interesting problems related to extremes. We present a few classical problems of this kind (Hall, 1985; Chenavier and Robert, 2018), and discuss how they can be rephrased using stationary configurations of points in an Euclidean space which are marked by real-valued random variables that we call scores. The theory is illustrated with examples showing how one can often rescale such scores and transform the positions of nearby points to obtain a limiting point process that we refer to as the tail configuration. Based on this local limit, we derive global Poissonian asymptotics for the corresponding extremes.
Joint work with: Paula Gonzalez, Soulivanh Thao and Julien Worms
Abstract: Numerical climate models are complex and combine a large number of physical processes. They are key tools in quantifying the relative contribution of potential anthropogenic causes (e.g., the current increase in greenhouse gases) on high-impact atmospheric variables like heavy rainfall or temperatures. These so-called climate extreme event attribution problems are particularly challenging in a non-stationary context. In addition, global climate models like any in silico numerical experiments are affected by different types of bias. In this talk, I will discuss about how to combine to different statistical concepts of univariate and multivariate extreme value theory to assess records changes in the context of extreme event attribution. In addition, the question of uncertainties quantification that remains a challenge in any climate attribution analysis will be explored from various directions. In particular, a simple model bias correction step for records will described in details. To illustrate our approach, we infer emergence times in precipitation from the CMIP5 and CMIP6 archives.
Joint work with: A. Ghosh and M. Kirsebom
Abstract: This talk will be an overview of some recent progress on the applications of extreme value theory for dynamical systems. We shall illustrate using a special case - Gauss dynamical system connected with continued fractions. If time permits, we shall also discuss various connections and extensions.
Joint work with: Klaus Herrmann and Marius Hofert
Abstract: When working with large insurance portfolios, say, the assumption of independence between claims may no longer be appropriate. As a consequence, classical limiting theory of maxima of iid variables may no longer apply. In this presentation, I will explore the weak limits of maxima of identically distributed random variables which are neither independent nor form a locally dependent time series and derive extensions of the Fisher—Tippett—Gnedenko Theorem. It turns out that the possible weak limits of suitably scaled maxima are no longer extreme-value distributions in general, but an asymptotic theory can nonetheless be developed and is driven by the properties of the diagonal of the underlying copula. I will further derive results on uniform convergence and present various illustrative examples.
Joint work with: Stefano Rizzelli
Abstract: The block maxima method is one of the most popular approaches for extreme value analysis with independent and identically distributed observations in the domain of attraction of an extreme value distribution. The lack of a rigorous study on the Bayesian inference in this context has limited its use for statistical analysis of extremes. We propose an empirical Bayes procedure for inference on the block maxima law and its related quantities. We show that the posterior distributions of the tail index of the data distribution and of the return levels (representative of future extreme episodes) are consistent and asymptotically normal. We also study the properties of the posterior predictive distribution, the key tool in Bayesian probabilistic forecasting. Simulations show its excellent inferential performances already with modest sample sizes. The utility of our proposal is showcased analysing extreme winds generated by hurricanes in Southeastern US.
Abstract: Extreme event attribution is a topic in climate science that tries to characterize how probabilities of extreme events have changed, or may change in the future, as a consequence of greenhouse-gas-induced climate change. Increasingly, the methods used by climate scientists use extreme value theory (primarily the GEV and GPD distributions) to characterize the distribution of extreme events as a function of time. The approach proposed here is conditional in the sense that the extreme event probabilities are conditioned on some regional weather variable, such as summer mean annual temperature averaged over a grid box, that we may reasonably hope to be well represented by climate models. The estimation problem is in three steps, (a) modeling the conditional distribution of extremes given the regional variable, (b) modeling the conditional distribution of the regional variable given climate model output, (c) combining steps (a) and (b) to model the conditional distribution of extremes given climate model output at various time points in the past and, using forward simulations of climate models, the future. In all the cases that have so far been considered, the analysis projects a substantial increase in extreme events in the future, but how big an increase depends a lot on the assumed scenario of increased greenhouse gas emissions. The analysis relies entirely on public data sources including daily station data from the Global Historical Climatological Network, gridded temperature monthly averages from the Climate Research Unit of the University of East Anglia, and climate model data from the CMIP6 archive. Examples will be discussed including the Pacific Northwest (North American) heatwave of June 2021, and the British heatwave of July 2022.
Joint work with: Evgeny Prokopenko
Abstract: We build a sharp approximation of the whole distribution of the sum of iid heavy-tailed random vectors, combining mean and extreme behaviors. It extends the so-called 'normex' approach from a univariate to a multivariate framework. We propose two possible multi-normex distributions, named d-Normex and MRV-Normex. Both rely on the Gaussian distribution for describing the mean behavior, via the CLT, while the difference between the two versions comes from using the exact distribution or the EV theorem for the maximum. The main theorems provide the rate of convergence for each version of the multi-normex distributions towards the distribution of the sum, assuming second order regular variation property for the norm of the parent random vector when considering the MRV-normex case. Numerical illustrations and comparisons are proposed with various dependence structures on the parent random vector.
Joint work with: Ana Ferreira and Cees de Valk
Abstract: In extreme value theory knowledge of the extreme value index is of prime interest. In practice it is important to know, whether this parameter is constant over time. Recently attention has been given to a possible gradual change in the extreme value index. So far, this has been done in case the extreme value index is known to remain positive, which is relevant for financial or economic applications. Here, we drop this restriction and obtain results that are also relevant, for example, for environmental issues (dealing with wind, rain, temperature, etc.). We discuss asymptotic normality for suitable estimators, and derive a Kolmogorov-Smirnov type statistical test to check uniformity over time.
Joint work with: Carlos Lima Azevedo, Haneen Farah and Joana Cavadas
Abstract: Observed accidents have been the main resource for road safety analysis over the past decades. Although such reliance seems quite straightforward, the rare nature of these events has made safety difficult to assess, especially for new and innovative traffic treatments. Surrogate measures of safety have allowed to step away from traditional safety performance functions and analyze safety performance without relying on accident records. In recent years, the use of extreme value theory (EV) models in combination with surrogate safety measures to estimate accident probabilities has gained popularity within the safety community. In this paper we extend existing efforts on EV for accident probability estimation for two dependent surrogate measures. Using detailed trajectory data from a driving simulator, we model the joint probability of head-on and rear-end collisions in passing maneuvers. In our estimation we account for driver specific characteristics and road infrastructure variables. We show that accounting for these factors improve the head-on and rear-end collision probabilities estimation. This work highlights the importance of considering driver and road heterogeneity in evaluating related safety events, of relevance to interventions both for in-vehicle and infrastructure-based solutions. Such features are essential to keep up with the expectations from surrogate safety measures for the integrated analysis of accident phenomena.
Joint work with: Marco Avella Medina and Gennady Samorodnitsky
Abstract: A spectral clustering algorithm for analyzing the dependence structure of multivariate extremes is proposed. More specifically, we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory. Our work studies the theoretical performance of spectral clustering based on a random k-nearest neighbor graph constructed from an extremal sample, i.e., the angular part of random vectors for which the radius exceeds a large threshold. In particular, we derive the asymptotic distribution of extremes arising from a linear factor model and prove that, under certain conditions, spectral clustering can consistently identify the clusters of extremes arising in this model. Leveraging this result we propose a simple consistent estimation strategy for learning the angular measure. Our theoretical findings are complemented with numerical experiments illustrating the finite sample performance of our methods.
Joint work with: Karthyek Murthy and Xiangyu Liu
Abstract: In this work, we investigate active learning techniques in an imbalanced classification problem where labelling is expensive. Classification tasks are known to be difficult when the data set is imbalanced, i.e., samples from one class is much smaller (rarer) than the other classes. Furthermore, for many instances in medical diagnosis, document/image classification, etc, even though we have we have a huge unlabelled data set, we have access to only a limited number of labelled points since labelling is either difficult, expensive or time-consuming. Active learning is a tool used to sequentially choose points to label in such circumstances. Under the assumption that the rare class appears due to some extreme phenomena we propose an importance-sampling based algorithm to sequentially query labels while obtaining the classifier by minimizing a cost-sensitive loss function accounting for the class imbalance. We show that our algorithm approximates a zero-variance importance sampler and in experiments we show that our approach achieves target (weighted) accuracy, and/or F1-score using a relatively small number of sample queries.
Joint work with: Scott Sisson, Alex Stephenson and Thomas Whitaker
Abstract: In 2019-2020 a large part of Australia was facing major bushfires. During this “black summer” at least 34 people perished in the flames and about 30 million hectares and 6000 building were burned. The 2021-2022 summer in Australia has been marred by intense heatwaves and bushfires in the west, and devastating floods in the east. Such extreme events seem to appear with increasing frequency, creating an urgent need to better understand the behaviour of extreme environmental phenomena.
Max-stable processes are a widely popular tool to model spatial extreme events with several flexible models available in the literature. For inference on max-stable models, exact likelihood estimation becomes quickly computationally intractable as the number of spatial locations grows, limiting their applicability to large study regions or fine grids. In this talk, we introduce two methodologies based on composite likelihoods, to circumvent this issue. First, we assume the occurrence times of maxima available in order to incorporate the Stephenson-Tawn concept into the composite likelihood framework. Second, we propose to aggregate the information between locations into histograms and to derive a composite likelihood variation for these summaries. The significant improvements in performance of each estimation procedures is established through simulation studies and illustrated on two temperature datasets from Australia.
Joint work with: Anass Aghbalou, François Portier, Chen Zhou
Abstract: We consider the problem of dimensionality reduction for prediction of a real valued target Y to be explained by a covariate vector X of dimension p, with a particular focus on extreme values of Y which are of particular concern for risk management.
The general purpose is to reduce the dimensionality of the statistical problem through an orthogonal projection on a lower dimensional subspace of the covariate space. Inspired by the sliced inverse regression (SIR) methods, we develop a novel framework (TIREX, Tail Inverse Regression for EXtreme response) relying on an appropriate notion of tail conditional independence in order to estimate an extreme sufficient dimension reduction (SDR) space of potentially smaller dimension than that of a classical SDR space.
We prove the weak convergence of tail empirical processes involved in the estimation procedure and we illustrate the relevance of the proposed approach on simulated and real world data.
Joint work with: Natalia Nolde
Abstract: The study of multivariate extremes is dominated by multivariate regular variation, although it is well known that this approach does not provide adequate distinction between random vectors whose components are not always simultaneously large. Various alternative dependence measures and representations have been proposed, with the most well-known being hidden regular variation and the conditional extreme value model. These varying depictions of extremal dependence arise through consideration of different parts of the multivariate domain, and particularly exploring what happens when extremes of one variable may grow at different rates to other variables. Thus far, these alternative representations have come from distinct sources and links between them are limited. In this work we elucidate many of the relevant connections through a geometrical approach. In particular, the shape of the limit set of scaled sample clouds in light-tailed margins is shown to provide a description of several different extremal dependence representations.
Joint work with: Zheng Gao
Abstract: The classic relative stability property in extreme value theory can be viewed as a type of concentration of maxima phenomenon for light-tailed random variables. It turns out that this property is key to establishing a range of phase-transition results for the problem of exact support recovery in high dimensions. The talk will begin by reviewing the high-dimensional testing and inference setup and some of the existing phase-transition phenomena. Then, we present a phase-transition result characterizing the statistical difficulty in the exact support recovery problem for the large class of threshold-type support estimators. The result holds under a type of uniform relative stability property, which holds under remarkably mild error-dependence conditions. This is demonstrated by the characterization of (uniform) relative stability for Gaussian error arrays.
These and more results on finite sample Bayes optimality, asymptotic minimax optimality, as well as applications to statistical genetics, can be found in the recent monograph:
Gao, Zheng and Stoev, Stilian, 2021. Concentration of Maxima and Fundamental Limits in High-Dimensional Testing and Inference, SpringerBriefs in Probability and Mathematical Statistics. https://link.springer.com/book/10.1007/978-3-030-80964-5
Joint work with: Jasper Velthoen, Juan-Juan Cai, Sebastian Engelke
Abstract: Extreme quantile regression provides estimates of conditional quantiles outside the range of the data. Classical methods such as quantile random forests perform poorly in such cases since data in the tail region are too scarce. Extreme value theory motivates to approximate the conditional distribution above a high threshold by a generalized Pareto distribution with covariate dependent parameters. This model allows for extrapolation beyond the range of observed values and estimation of conditional extreme quantiles. We propose a gradient boosting procedure to estimate a conditional generalized Pareto distribution by minimizing its deviance. Cross-validation is used for the choice of tuning parameters such as the number of trees and the tree depths. We discuss diagnostic plots such as variable importance and partial dependence plots, which help to interpret the fitted models. In simulation studies we show that our gradient boosting procedure outperforms classical methods from quantile regression and extreme value theory, especially for high-dimensional predictor spaces and complex parameter response surfaces. An application to statistical post-processing of weather forecasts with precipitation data in the Netherlands is proposed.
Joint work with: Jevgenijs Ivanovs, Adrien Hitz and Stanislav Volgushev
Abstract: While theory and statistical tools for univariate extremes are well developed, methods for high-dimensional and complex data sets are still scarce. Appropriate notions of sparsity and connections to other fields such as machine learning, graphical models, and high-dimensional statistics have only recently been established. We categorize the existing approaches into three main lines of research and formalize the corresponding sparsity notions. We review the recent literature on sparse extreme value modeling and discuss in which sense they enforce one of the sparsity notions. A particular focus is put on the field of extremal graphical models that satisfy certain Markov properties, and the connections to high-dimensional statistics for extremes.
Joint work with: Léo Belzile, Anthony Davison, Jutta Gampe, Dimitrii Zholud
Abstract: There is sustained and widespread interest in understanding the limit, if any, to the human lifespan. Apart from its intrinsic interest, changes in survival at extreme ages, say 105 and over, have implications for the biology of ageing and for the sustainability of social security systems. Recent analyses of data on the oldest human lifespans have led to competing claims about survival and to controversy, often due to misunderstandings about the selection of data and to inappropriate use of statistical methods. One central question is whether the endpoint of the underlying lifetime distribution is finite. This talk discusses the particularities associated with data on extreme lifespans, presents models from Extreme Value Statistics and Demography for their analysis, and outlines ways of handling the truncation and censoring often present in the data. It provides a critical assessment of earlier work and illustrates the ideas through novel analysis of new datasets on 105+ year lifetimes.
Joint work with: Yuri Goegebeur and Jing Qin
Abstract: In this talk, we will study the estimation of reinsurance premiums when the claim size is observed together with additional information in the form of random covariates. Using extreme value arguments, we will propose an estimator for the risk premium conditional on a value for the covariate, and we will derive its asymptotic properties, after suitable normalization. The finite sample behavior will be evaluated with a simulation experiment, and we will apply the methodology to a dataset of automobile insurance claims from Australia.
Joint work with: Zun Yin and Philippe Naveau
Abstract: The field of statistics has become one of the mathematical foundations in detection and attribution (D&A) studies, especially with regard to assessing uncertainties. The classical paradigm in D&A is to infer regression coefficients in order to quantity expected response patterns to different external forcings. Although convenient, this approach has a few shortcomings. For example, how to interpret regression coefficients if observations and forced climate runs are both tainted by a large error. To bypass this hurdle, Ribes et al (2016) proposed an Error-In-Variable (EIV) framework where regression coefficients were removed from the analysis. Still, this setup based on the Gaussian assumption is not appropriate to handle extremes like annual temperatures maxima. As the key objective of D&A is to discriminate between different causes, we propose, study and discuss how to estimate and compare relevant posterior probabilities to compare different EIV models based on the Generalized Extreme Value distributions.
Joint work with: Carlos Amendola, Steffen Lauritzen, and Ngoc Tran.
Abstract: Motivated by extreme value theory, max-linear graphical models have been introduced and studied as an alternative to the classical Gaussian or discrete distributions used in graphical modeling. We present max-linear models naturally in the framework of tropical geometry. This perspective allows us to shed light on some known results and to prove others with algebraic techniques. In particular, we give a complete description of conditional independence relations for max-linear recursive models.
Abstract: In this talk we will consider multivariate regularly varying time series. Using probabilistic tools developed recently, we will study estimators of so-called cluster functionals. These cluster functionals quantify extremal clustering. We will present asymptotic theory for disjoint blocks and sliding blocks estimators in the peak-over-threshold framework. In particular, we will show that both classes of estimators have the same asymptotic variance. This is in contrast to the situation when block maxima method is implemented.
Joint work with: Maxime Taillardat, Raphaël de Fondeville and Philippe Naveau.
Abstract: In this talk, we will consider the problem of assessing the behaviour of forecast evaluation procedures with respect to extreme events. After discussing why this requires specific procedures, we will propose an index to measure the ability of ensemble forecasts to predict extremes.
Joint work with: Jochem Oorschot
Abstract: The block maxima (BM) approach in extreme value analysis fits a sample of block maxima to the Generalized Extreme Value (GEV) distribution. We consider all potential blocks from a sample, which leads to the All Block Maxima (ABM) estimator. Different from existing estimators based on the BM approach, the ABM estimator is permutation invariant. We show the asymptotic behavior of the ABM estimator, which has the lowest asymptotic variance among all estimators using the BM approach. Simulation studies justify our asymptotic theories. A key step in establishing the asymptotic theory for the ABM estimator is to obtain asymptotic expansions for the tail empirical process based on higher order statistics with weights.
Joint work with: Ranjbar, S., Cantoni, E., Marra, G., Radice, R. and Jaton-Ogay, K.
Abstract: Viruses causing flu or milder coronavirus colds are often referred to as ``seasonal viruses'' as they tend to subside in warmer months. In other words, meteorological conditions tend to impact the activity of viruses, and this information can be exploited for the operational management of hospitals. In this study, we use three years of daily data from one of the biggest hospitals in Switzerland and focus on modelling the extremes of hospital visits from patients showing flu-like symptoms and the number of positive cases of flu. We propose employing a discrete Generalized Pareto distribution for the number of positive and negative cases, and a Generalized Pareto distribution for the odds of positive cases. Our modelling framework allows for the parameters of these distributions to be linked to covariate effects, and for outlying observations to be dealt with via a robust estimation approach. Because meteorological conditions may vary over time, we use meteorological and not calendar variations to explain hospital charge extremes, and our empirical findings highlight their significance. We propose a measure of hospital congestion and a related tool to estimate the resulting CaRe (Charge-at-Risk-estimation) under different meteorological conditions. The empirical effectiveness of the proposed method is assessed through a simulation study.
Joint work with: Zaoli Chen
Abstract: We study clustering of the extremes in a stationary sequence with subexponential tails in the maximum domain of attraction of the Gumbel law. We obtain functional limit theorems in the space of random sup-measures and in the space D(0,infinity). The limits have the Gumbel distribution if the memory is only moderately long. However, as our results demonstrate rather strikingly, the``heuristic of a single big jump'' could fail even in a moderately long range dependence setting. As the tails become lighter, the extremal behavior of a stationary process may depend on multiple large values of the driving noise.
Joint work with: Emeric Thibaud, Yujing Jiang, Michael Wehner, Miranda Fix, Nehali Mhatre, Jeongjin Lee, Christian Rohrbeck
Abstract: Linear methods are familiar methods to statistical modeling and analysis, from time series, to spatial statistics, to multivariate analysis. Generally, methods are linked to pairwise dependence information contained in the covariance matrix.
Multivariate regular variation is a framework common to extreme value analysis, and when restricted to the positive orthant, focuses analysis on the upper tail. With a specific link function, transformed linear operations preserve regular variation on the positive orthant. The tail pairwise dependence matrix (TPDM) summarizes pairwise extremal dependence, is non-negative definite like a covarianced matrix, and additionally is completely positive.
In this overview talk, we will present our work in developing methods based on transformed-linear operations and the TPDM for modeling and studying extremes. We will briefly present extremal principal component analysis, a spatial autoregressive model for extremes, transformed-linear extremal time series models, and linear prediction for extremes.
Joint work with: Stephan Clémençon, Hamid Jalalzai and Anne Sabourin, Télécom Paris, Institut polytechnique de Paris
Abstract: The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is therefore an important step in learning tasks involving observations far away from the center. In the common situation when the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the resulting structure of the empirical angular measure is complex and the study of its concentration properties is challenging. It is the purpose of the paper to establish non-asymptotic bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. As expected, the bound scales essentially as the square root of the effective sample size, up to a logarithmic factor. In addition, we propose a variant of the empirical angular measure by omitting the most extreme observations. For the new estimator, the logarithmic factor in the concentration bound is replaced by a factor depending on the truncation level. The bounds are used to provide performance guarantees on a binary classifier in extreme regions based on the empirical risk minimization principle and built upon the angular measure.