Research

Our group  focuses on fundamental research in statistics and machine learning. 

Recent publications

Selected publications in theory/methodology (the last 10  years)

2023


2022


2021


2020

2019

2018

2017

2016

2015

Selected publications in applications (the last 10  years)

Books

D. Nualart and E. Nualart,  Introduction to Malliavin Calculus, IMS Textbooks, Cambridge University Press, 2018.

This textbook offers a compact introductory course on Malliavin calculus, an active and powerful area of research. It covers recent applications, including density formulas, regularity of probability laws, central and non-central limit theorems for Gaussian functionals, convergence of densities and non-central limit theorems for the local time of Brownian motion. The book also includes a self-contained presentation of Brownian motion and stochastic calculus, as well as Lévy processes and stochastic calculus for jump processes. Accessible to non-experts, the book can be used by graduate students and researchers to develop their mastery of the core techniques necessary for further study.

M. Greenacre. Compositional Data Analysis in Practice. Chapman&Hall, 2018.

Compositional Data Analysis in Practice is a user-oriented practical guide to the analysis of data with the property of a constant sum, for example percentages adding up to 100%. Compositional data can give misleading results if regular statistical methods are applied, and are best analysed by first transforming them to logarithms of ratios. This book explains how this transformation affects the analysis, results and interpretation of this very special type of data. All aspects of compositional data analysis are considered: visualization, modelling, dimension-reduction, clustering and variable selection, with many examples in the fields of food science, archaeology, sociology and biochemistry, and a final chapter containing a complete case study using fatty acid compositions in ecology. The applicability of these methods extends to other fields such as linguistics, geochemistry, marketing, economics and finance.

P. Zwiernik. Semialgebraic Statistics and Latent Tree Models. Chapman&Hall, 2017.

Semialgebraic Statistics and Latent Tree Models explains how to analyze statistical models with hidden (latent) variables. It takes a systematic, geometric approach to studying the semialgebraic structure of latent tree models. The first part of the book gives a general introduction to key concepts in algebraic statistics, focusing on methods that are helpful in the study of models with hidden variables. The author uses tensor geometry as a natural language to deal with multivariate probability distributions, develops new combinatorial tools to study models with hidden data, and describes the semialgebraic structure of statistical models. The second part illustrates important examples of tree models with hidden variables. The book discusses the underlying models and related combinatorial concepts of phylogenetic trees as well as the local and global geometry of latent tree models. It also extends previous results to Gaussian latent tree models. This book shows you how both combinatorics and algebraic geometry enable a better understanding of latent tree models. It contains many results on the geometry of the models, including a detailed analysis of identifiability and the defining polynomial constraints.

S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.

Concentration inequalities for functions of independent random variables is an area of probability theory that has witnessed a great revolution in the last few decades, and has applications in a wide variety of areas such as machine learning, statistics, discrete mathematics, and high-dimensional geometry. Roughly speaking, if a function of many independent random variables does not depend too much on any of the variables then it is concentrated in the sense that with high probability, it is close to its expected value. This book offers a host of inequalities to illustrate this rich theory in an accessible way by covering the key developments and applications in the field. The authors describe the interplay between the probabilistic structure (independence) and a variety of tools ranging from functional inequalities to transportation arguments to information theory. Applications to the study of empirical processes, random projections, random matrix theory, and threshold phenomena are also presented.

N. Cesa-Bianchi, and G. Lugosi, Prediction, Learning, and Games. Cambridge University Press, 2006.

This book offers the first comprehensive treatment of the problem of predicting "individual sequences". Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of "prediction using expert advice", a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections. Old and new forecasting methods are described in a mathematically precise way with the purpose of characterizing their theoretical limitations and possibilities.

L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation. Springer, 2000.

Density estimation has evolved enormously since the days of bar plots and histograms, but researchers and users are still struggling with the problem of the selection of the bin widths. This book is the first to explore a new paradigm for the data-based or automatic selection of the free parameters of density estimates in general so that the expected error is within a given constant multiple of the best possible error. The paradigm can be used in nearly all density estimates and for most model selection problems, both parametric and nonparametric.

L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer, 1996.

A self-contained and coherent account of probabilistic techniques, covering: distance measures, kernel rules, nearest neighbour rules, Vapnik-Chervonenkis theory, parametric classification, and feature extraction. Each chapter concludes with problems and exercises to further the readers understanding. Both research workers and graduate students will benefit from this wide-ranging and up-to-date account of a fast- moving field.

M. Greenacre. Correspondence Analysis in Practice. Chapman&Hall, 1993.

Correspondence analysis is a multivariate method for exploring cross-tabular data by converting such tables into graphical displays, called 'maps', and related numerical statistics. Since cross-tabulations are so often produced in the course of social science research, correspondence analysis is valuable in understanding the information contained in these tables. This book fills the gab in the literature between the theory and practice of this method. Various theoretical aspects are presented in a language accessible to both social scientists and statisticians and a wide variety of applications are given which demonstrate the versatility of the method to interpret tabular data in a unique graphical way. The first part of the book deals with basic concepts of correspondence analysis and related methods for analyzing cross-tabulations. It then looks at the multivariate case when there are several variables of interest, including the relationship to cluster analysis, factor analysis and reliability of measurement. Applications to longitudinal data: event history data, panel data and trend data are demonstrated. Finally, it examines further applications in the social sciences, including the analysis of textual data, lifestyle data and data on product descriptions in marketing research. Correspondence Analysis in the Social Sciences gives lecturers, researchers and students a detailed introduction to help them teach the method and apply it to their own research problems. Researchers in psychology, sociology, business, marketing and statistics will all find this book particularly useful.

Current editorial services

Christian Brownlees:

Annals of Financial Economics, Econometrics, Journal of Network Theory in Finance, Journal of Risk and Financial Management 

Gábor Lugosi:

Annals of Applied Probability, Journal of Machine Learning Research, Probability Theory and Related Fields 

Eulàlia Nualart:

Stochastic Processes and their Applications

David Rossell:

Bayesian Analysis

Piotr Zwiernik:

Journal of Royal Statistical Society: Series B, Biometrika, Algebraic Statistics, Scandinavian Journal of Statistics

Research projects

"Prediccion, Inferencia y Computacion en Modelos Estructurados de Alta Dimension"

"Algorithms and Learning for AI"

“High-dimensional problems in structured probabilistic models”

“Estimación de redes latentes”