Deciphering Nanoporosity of Amorphous Materials using Topological Data Analysis

I have recently joined as a postdoc a collaboration between the Department of Mathematical Sciences and the Department of Chemistry and Bioscience of Aalborg University and the Lawrence Berkeley National Laboratory.


The aim of this project is to study glass structures characterizing their medium-range order from a geometric and topological point of view. The study of the properties of disordered materials like glass is a puzzling problem touching many scientific areas and needs refined mathematical tools to work with the periodic point sets of its atoms.

Data Analysis with Merge Trees

Typical topological data analysis pipelines turn each datum of a data set into a (possibly multidimensional) filtration of topological spaces. A typical example would be the union of balls of radius "r" centered in a finite set of points in a metric space.

A finite point cloud with its Rips filtration and the associated persistence diagram in dimension 1.

Homology groups are then extracted from the filtration and persistence modules are obtained via the functorial properties of the homology functors. Such modules then need to be represented with a topological summary, embedded in a space with some structure enabling some kind of analysis (e.g. a metric space). When the indexing set of the filtration is (a subset of) the real line a persistent module (with some additional requirements) is completely characterized (up to isomorphism) by a persistence diagram: a set of points in the plane, called persistence pairs, with each point (b,d) representing an homology class being born at time "b" and having merged at time "d" with another class being born before "b".


When working with zero dimensional homology our topological information of interest is given by path-connected components. In this case we have a canonical basis inside the homology groups (i.e. the one of path connected components) and we can track down their evolution along the filtration.

When the filtration ends with a path-connected space, it is then quite natural to represent this information with a tree, called merge tree.

A filtration.

Its path-connected components.

The associated merge tree.

Merge trees contain much more information then persistence diagrams, to the point that, depending on the configuration of the persistence pairs, there may be up to "n!" merge trees sharing the same persistence diagrams (with "n" being the number of points in the diagram). Moreover with merge trees it is possible to collect information defined on path connected components and represent it with functions defined on merge trees.

A function defined on a merge tree.

In my PhD I have developed 1) a framework to measure distances between functions defined on different merge trees 2) a metric between merge trees, satisfying stability properties wrt the Vietoris Rips filtration of point clouds (considered with the Gromov-Hausdorff metric) and the sublevel set filtration of real valued functions (considered with the sup norm). With two additional works (on functional data analysis and radiomics) I show that, despite the computational cost of the metric, it is indeed possible to employ merge trees as data analysis tools.

Statistics with Optimal Transport

Optimal transport is a framework which allows the comparison of probability distributions (satisfying mild conditions) via Wasserstein metrics.

Given two probability distributions A and B on some space X, a transport map from A to B is a map from X to X that turns an A-distributed random variable into a B-distributed random variable. If the distance by which a transport map moves the points of X is weighted by the amount of mass concentrated near the points, then one can try to find an optimal transport map, i.e. a map T solving Monge's problem:

That is, a map which optimally rearranges the base space in order to minimize the work done to turn the first probability distribution into the second one.

https://www.microsoft.com/en-us/research/blog/measuring-dataset-similarity-using-optimal-transport/

Optimal transport maps do not always exists, but the works of Villani, Ambrosio (with Gigli and Savarè), McCann and Gigli show that in many situations there are "enough" optimal transport maps to define a (pseudo) tangent space at a measure where the vectors to the tangent space are indeed the optimal transport maps. Such tangent spaces do not exactly match the requirements to make the (Wasserstein) space of probability measures a (infinite dimensional) Riemannian manifold, but still enjoy many good properties.

Pseudo tangent space at a measure μ, with the tangent vector going from μ to τ.

By mapping all probability measures into a tangent space, one then moves the problem of doing statistics from a space of probability distributions to a much more tractable space of functions/sections of the tangent bundle of X.


In particular, when the base space is the real line, the Wasserstein space is flat and thus this pseudo tangent structures contain a perfect copy of the Wasserstein space. This fact has been exploited by many authors to define intrinsic statistical techniques (e.g. see here and references therein).


Outside this simple case, the situation is much more complicated and many non-trivial results need to be established on the space of optimal transport maps. Along with Mario Beraha, we are working to define statistical techniques when the ground space X is a Riemannian Manifold, starting from the simple case of S1 .