Eustasio Del Barrio
Eustasio Del Barrio
Projection based Wasserstein distances
The Wasserstein distance is a fundamental tool for measuring deviations between probability distributions. Yet, despite its geometric attractive features and intuitive interpretation, its use in high-dimensional settings is hampered by computational and statistical limitations. In this talk I present an alterenative based on the search for an optimal basis and lower-dimensional projections. This alternative not only preserves most of the interesting features of the classical Wasserstein metric, but also enjoys an insightful geometric interpretation in terms of monotone transformations. Remarkably, this new metric coincides with the standard Wasserstein metric when the involved probabilities have a common copula in some basis. This link allows to introduce a test of interest in some econometric problems.
Gromov-Wasserstein Bound between Reeb and Mapper Graphs
Since its introduction as a computable approximation of the Reeb graph, the Mapper
graph has become one of the most popular tools from topological data analysis for performing
data visualization and inference. However, finding an appropriate metric (that is, a tractable
metric with theoretical guarantees) for comparing Reeb and Mapper graphs, in order to, e.g.,
quantify the rate of convergence of the Mapper graph to the Reeb graph, is a difficult problem.
While several metrics have been proposed in the literature, none is able to incorporate measure
information, when data points are sampled according to an underlying probability measure.
The resulting Reeb and Mapper graphs are therefore purely deterministic and combinatorial,
and substantial effort is thus required to ensure their statistical validity.
In this presentation , we handle this issue by treating Reeb and Mapper graphs as metric measure
spaces. This allows us to use Gromov-Wasserstein metrics to compare these graphs directly in
order to better incorporate the probability measures that data points are sampled from. Then,
we describe the geometry that arises from this perspective, and we derive rates of convergence
of the Mapper graph to the Reeb graph in this context. Finally, we showcase the usefulness
of such metrics for Reeb and Mapper graphs in a few numerical experiments.
On the interplay between Mutual Information and Diffusion Processes
Mutual Information (MI) is essential for quantifying dependencies in complex systems, yet accurately estimating it in high-dimensional settings is challenging. In this talk, we introduce novel methods for MI estimation using diffusion processes. First, we present MINDE, an approach that leverages score-based diffusion models and an interpretation of the Girsanov theorem to estimate the Kullback-Leibler divergence between probability densities. This enhances MI and entropy estimation, outperforming existing techniques. Additionally, we introduce SΩI, a method for computing O-information (a generalization of mutual information to more than two variables) without restrictive assumptions, effectively capturing higher-order dependencies and revealing the synergy-redundancy balance in multivariate systems.
We then investigate how MI affects the dynamics of generative models based on SDEs. By extending Nonlinear Filtering (NLF), we develop a theoretical framework that shows how MI quantifies the influence of unobservable latent abstractions on generative pathways. Empirical studies validate our theory, demonstrating how MI guides the evolution of latent variables that steer the generative process.
Cyril Letrouit
Quantitative Stability of Optimal Transport
Optimal transport consists in sending a given source probability measure 𝜌 to a given target probability measure 𝜇 in an optimal way with respect to a certain cost. On bounded subsets of ℝ^d, if the cost is given by the squared Euclidean distance and if 𝜌 is absolutely continuous, there exists a unique optimal transport map from 𝜌 to 𝜇. Optimal transport has been widely applied across various domains in statistics: for instance to compare distributions, for defining barycenters, to construct embeddings, in generative modeling, etc.
In this talk, we provide a quantitative answer to the following stability question: if 𝜇 is perturbed, can the optimal transport map from 𝜌 to 𝜇 change significantly? The answer depends on the properties of the density 𝜌. This question takes its roots in numerical optimal transport, and has found applications to other problems like the statistical estimation of optimal transport maps, the random matching problem, or the computation of Wasserstein barycenters.
The talk is based on joint works with Quentin Mérigot and Jun Kitagawa.
Bounds on the Hausdorff Distance with Applications to Topological Reconstruction of Compact Sets
In topological data analysis, controlling the Hausdorff distance plays a crucial role in persistent homology and shape reconstruction.
In this talk, we consider a sequence of stationary, dependent, compactly supported random variables.
Our goals are twofold:
First, we propose several methods to bound the Hausdorff distance between this stationary sequence of dependent random variables and its common support. The resulting bounds allow us to achieve the optimal rate of convergence for the i.i.d. case (proven by Chazal et al. in 2015).
Next, we present a novel result on topological reconstruction and provide some illustrative examples.
A part of this talk is joint work with Sadok Kallel.
High-dimensional outlier detection using random projections
There are multiple methods for detecting outliers in multivariate data in the literature, but most of them require estimating the covariance matrix. As the dimension increases, the estimation of the matrix becomes more complex, eventually rendering it impossible in high dimensions. To avoid the need for estimating this matrix, we propose a random projection-based procedure for detecting outliers in Gaussian multivariate data. The method involves projecting the data onto several one-dimensional subspaces, where an appropriate univariate outlier detection method—similar to Tukey’s method but with a threshold dependent on the initial dimension and sample size—is applied. A common issue in scenarios where random projections are used (e.g., goodness of fit, analysis of variance, constructing depths, etc.) is the lack of clear guidance on the number of projections required. To address this, we propose the use of sequential analysis. Simulated and real datasets are used to illustrate the performance of the proposed method.
Peter Potaptchik
Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive
Diffusion models demonstrate remarkable generalisation capabilities, yet the mechanisms underpinning this success remain only partially understood. A leading conjecture, based on the manifold hypothesis, attributes this to their ability to adapt to low-dimensional geometric structures within the data. In this talk, we provide evidence for this conjecture, focusing on how such phenomena result from the score-matching objective. We investigate the role of implicit regularisation by analysing the effect of smoothing the minimisers of this objective. We show that smoothing the score function—which is equivalent to smoothing in the log-density domain—produces a smoothing effect tangential to the data manifold.
The geometry and topology of patterns in 3D shapes, with applications to leukaemia
The complex morphologies of biological shapes inform us on the functioning of living systems, in healthy or diseased states. Quantifying such shapes is challenging due to their disparity, and motivates us to propose a unifying view. These shapes may be presented as optimizers of some phase-field model that approximates a general curvature functional, beyond the classical biomembrane models. On the other hand, we show how the persistent homology of the signed distance's sublevel set filtration summarizes textural features. We bridge a gap by generalizing Morse theory to distance functions generated by smooth compact boundaries in the Euclidean setting, enabling a rigorous interpretation of shape textures in terms of critical points of the signed distance. Finally, we use these methods to show how acute myeloid leukaemia remodels the bone marrow vessels in unexpected ways, with further applications in materials science.
Regularity of the score and convergence rates of diffusion models
Abstract: We show that the score function naturally adapts to the regularity of the data distribution. This result can be applied to the stability of diffusion models and provide a short proof of their minimax optimality.
Renata Turkes
Shoving tubes through shapes gives a sufficient and efficient shape statistic
The classical persistent homology transform was introduced in the field of topological data analysis about 10 years ago, and has since been proven to be a very powerful descriptor of Euclidean shapes. The transform sends a shape X to the map associating to each direction v on the sphere S^{n-1} the persistent diagram with respect to the height function h_v. The transform has been shown to be injective (it is a sufficient shape statistic: probing a shape from each direction completely describes it), and for each shape it gives a continuous map from the sphere to the space of persistence diagrams.
We introduce a generalised persistent homology transform (PHT) in which we consider arbitrary parameter spaces, and any filtration functions. In particular, we define the "distance-from-flat” PHT, where the parameter space is the Grassmannian AG(m,n) of affine subspaces of R^n, and the filtration functions d_P encode the distance from a given flat P.
We prove that this version retains continuity and injectivity, while offering computational advantages over the classical PHT. In particular, homology in degree 0 suffices for the injectivity of the distance-from-line, so-called tubular, PHT, yielding an efficient tool that can outperform top neural networks in shape classification.
Authors: Adam Onus, Nina Otter, Renata Turkes
Statistical analysis of empirical graph Laplacians
Laplacian Eigenmaps and Diffusion Maps are nonlinear dimensionality reduction methods that use the eigenvalues and eigenvectors of (un)normalized graph Laplacians. Both methods are applied when the data is sampled from a low-dimensional manifold, embedded in a high-dimensional Euclidean space. In addition, higher-order generalizations of graph Laplacians (so-called Hodge Laplacians) allow to deduce more sophisticated topological information. From a mathematical perspective, the main problem is to understand these empirical Laplacians as spectral approximations of the Laplace-Beltrami operators on the underlying manifold.
In this talk, we will first study graph Laplacians based on i.i.d. observations uniformly distributed on a compact submanifold of the Euclidean space. In our analysis, we connect these empirical Laplacians to kernel principal component analysis. This leads to novel points of view and allows to leverage results for empirical covariance operators in infinite dimensions. We will then discuss higher-order generalizations of graph Laplacians, and show how they are connected to Hodge theory on Riemannian manifolds.