ILJU Symposium of Mathematics, 2021



ILJU POSTECH MINDS Workshop on


Topological Data Analysis

and Machine Learning


July 6 (Tuesday) ~ July 9 (Friday), 2021

GMT+9, Online streaming


Organizers

Workshop Rationale

Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. Recently, much TDA research has been devoted to developing TDA compatible in machine learning workflow. This workshop will bring together researchers and students working on TDA and machine learning and provide opportunity that they present their recent research and share ideas. Further this workshop will also provide tutorial sessions that will introduce various TDA computational tools and provide practical hands-on tutorials.

Registration

Program Overview (GMT+9)

Day 1 (July 6, Tuesday)

Morning session

  • 09:00 ~ 10:00 : Rosen

  • 10:00 ~ 11:00 : Shiu

  • 11:00 ~ 12:00 : Uda

Afternoon session

  • 14:00 ~ 15:00 : Jeon - Tutorial

  • 15:00 ~ 16:00 : Jeon - Tutorial

Day 2 (July 7, Wednesday)

Morning session

  • 09:00 ~ 10:00 : Needham

  • 10:00 ~ 11:00 : B. Wang

  • 11:00 ~ 12:00 : Kaji - Tutorial 1

Afternoon session

  • 14:00 ~ 15:00 : Kaji - Tutorial 2

  • 15:00 ~ 16:00 : Carrier

Day 3 (July 8, Thursday)

Morning session

  • 09:00 ~ 10:00 : Chung

  • 10:00 ~ 11:00 : Chung - Tutorial

  • 11:00 ~ 12:00 : Tanabe

Afternoon session

  • 14:00 ~ 15:00 : Xia

  • 15:00 ~ 16:00 : Carrier - Tutorial

Day 4 (July 9, Friday)

Morning session

  • 09:00 ~ 10:00 : Y Wang - Tutorial

  • 10:00 ~ 11:00 : Escolar - Tutorial

  • 11:00 ~ 12:00 : Tran

Afternoon session

  • 14:00 ~ 15:00 : Escolar

  • 15:00 ~ 16:00 : Lupo & Reise, Tutorial

Program

Day 1 (July 6, Tuesday)

09:00 - 10:00 - Paul Rosen (University of South Florida), Topological Data Analysis in Graph Analysis and Visualization

10:00 - 11:00 - Gary Shiu (University of Wisconsin), The Topology of Data: from String Theory to Cosmology to Phases of Matter

11:00 - 12:00 - Tomoki Uda (Tohoku University), Stability of Reeb Trees and Application to Noisy Images

14:00 - 15:00 - Bogwang Jeon (POSTECH), An introduction to homology: from calculus to persistent homology

15:00 - 16:00 - Bogwang Jeon (POSTECH), An introduction to homology: from calculus to persistent homology


Day 2 (July 7, Wednesday)

09:00 - 10:00 - Thomas Needham (Florida State University), Decorated Merge Trees

10:00 - 11:00 - Bei Wang (University of Utah), The Topology of Activation Vectors in Deep Learning

11:00 - 12:00 - Shizuo Kaji (Kyushu University), Tutorial on CubicalRipser and other TDA software using Python

14:00 - 15:00 - Shizuo Kaji (Kyushu University), Tutorial continued

15:00 - 16:00 - Mathieu Carrière (INRIA), Topological analysis of immunofluorescence images




Day 3 (July 8, Thursday)

09:00 - 10:00 - Moo K. Chung (University of Wisconsin - Madison), Lattice paths for persistent diagrams applied to COVID-19 virus spike protein structures

10:00 - 11:00 - Moo K. Chung (University of Wisconsin - Madison), Tutorial on Wasserstein distance on graphs

11:00 - 12:00 - Naoya Tanabe (Kyoto University Hospital), A homological approach to a mathematical definition of local abnormalities in lung diseases on computed tomography

14:00 - 15:00 - Kelin Xia (Nanyang Technological University), Topological data analysis based machine learning for drug design

15:00 - 16:00 - Mathieu Carrière (INRIA), Tutorial


Day 4 (July 9, Friday)

09:00 - 10:00 - Yuan Wang (University of South Carolina), Tutorial on Topological Signal Processing and Inference with EEG Applications

10:00 - 11:00 - Emerson G. Escolar (Kobe University), Mapper Tutorial

11:00 - 12:00 - Mai Lan Tran (POSTECH), Topological Data Analysis of Korean Music in Jeongganbo

14:00 - 15:00 - Emerson G. Escolar (Kobe University), Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics

15:00 - 16:00 - Umberto Lupo (EPFL) & Wojciech Reise (Université Paris-Saclay & INRIA), giotto-tda tutorial: machine learning pipelines with persistent homology and Mapper

Tutorial Resources

Day 1 (July 6, Tuesday)

  • Bogwang Jeon (POSTECH) - Persistent homology


Day 2 (July 7, Wednesday)

Materials: link to Google Colab Notebook (you can go through the codes on cloud without installing anything on your computer).
Short introduction slides are also available.


  • Thomas Needham (Florida State University)

Materials: Related software on Decorated-Merge-Trees


Day 3 (July 8, Thursday)

  • Moo K. Chung (University of Wisconsin - Madison) Tutorial on Wasserstein distance on graphs

Requirement: MATLAB and downloding zipped file from the tutorial website. Tutorial is based on page 5-10 (Wasserstein distance on graphs) and page 19-26 (Simultation study 2) of arXiv:2012.00675. (download song.2020.simulation2.zip from http://www.stat.wisc.edu/~mchung/dynamicTDA). The detailed manual of the codes are availble on the website.


  • Mathieu Carrière (INRIA), Tutorial (multiparameter persistent homology and Mapper with multivariate filters)

Materials: Jupyter notebooks and the Gudhi library.


Day 4 (July 9, Friday)

  • Yuan Wang (University of South Carolina) Tutorial on topological signal processing

Requirement: Tutorial is based on paper Wang et al. 2018 and MATLAB codes downloaded from PLab.


  • Emerson G. Escolar (Kobe University) - Mapper Tutorial

Materials: Follow the instructions on the webpage for the installation


Materials: Giotto TDA


Invited Speakers

Paul Rosen

(University of South Florida)

Topological Data Analysis in Graph Analysis and Visualization


Abstract: Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to visualize and even more challenging to understand. Node-link diagrams are popular for drawing graphs, but clutter and overlap of unrelated structures can lead to confusing graph visualizations. In this talk, I will discuss the role of Topological Data Analysis in easing the cognitive burden of graph visualization. I will discuss several approaches, including using persistent homology to detect significant events in time-varying graphs; using feature generators from persistent homology to enable user-selected modifications of a force-directed graph layout; and finally, using Mapper for multiscale graph aggregations.

Gary Shiu

(University of Wisconsin)

The Topology of Data: from String Theory to Cosmology to Phases of Matter


Abstract: We are faced with an explosion of data in many areas of physics, but very so often, it is not the size but the complexity of the data that makes extracting physics from big datasets challenging. As I will discuss in this talk, topological data analysis can be used to decode the underlying physics from the shapes of complex datasets. I will discuss three applications of topological data analysis: 1) constraining cosmological parameters from CMB measurements and large scale structures data, 2) detecting and classifying phases of matter, and 3) identifying structure of the string theory landscape. Persistent homology condenses these datasets into their most relevant (and interpretable) features, so that simple statistical pipelines are sufficient in these contexts. This suggests that TDA can be used in conjunction with machine learning algorithms and improves their architecture.

Naoya Tanabe

(Kyoto University Hospital)

A homological approach to a mathematical definition of local abnormalities in lung diseases on computed tomography

Abstract

Bogwang Jeon

(POSTECH)

An intoroduction to homology: from calculus to persistent homology

Abstract: Starting from a line integration in calculus, I will first explain Poincare's original idea of homoloy. Then I will go over various ways to define and compute it as well as elementary properties of it. Lastly, I will cover some basic part of persistent homology. This is an introductory course and undergraduate level math should be enough to understand it all.

Thomas Needham

(Florida State University)

Decorated Merge Trees

Abstract: I will introduce the concept of a decorated merge tree (DMT), an invariant which tracks interactions between homological features in multiple degrees for a filtered space. Intuitively, a DMT is a merge tree overlaid with higher dimensional barcodes. Formally, a DMT can be understood abstractly in terms of category theory or concretely as a barcode-attributed combinatorial graph. There is a natural extension of interleaving distance to the setting of DMTs; I will discuss stability properties of this metric as well as methods for computing it via Gromov-Wasserstein distance, a tool from optimal transport. This is joint work with Justin Curry, Haibin Hang, Washington Mio and Osman Okutan.

Bei Wang

(University of Utah)

The Topology of Activation Vectors in Deep Learning


Abstract: Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a

global view of what neurons detect by studying their activations. In

particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we present TopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios using TopoAct that provide valuable insights into learned representations of neural networks. We expect TopoAct to give a topological perspective that enriches the current toolbox of neural network analysis.

This is joint work with Archit Rathore, Nithin Chalapathi, and Sourabh Palande.

Emerson G. Escolar

(Kobe University)

Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics

Abstract: Mapper, one of the tools of TDA, is able to compactly summarize complicated high-dimensional data as graphs. In this work, we apply Mapper to 333 major firms' patenting activities in 1976-2005 to visualize firms' technological space as a Mapper graph. We observe branch-like structures called "flares" related to firms with unique trajectories in the Mapper graph, and propose an algorithm to extract them. We find statistically and economically significant correlations between the flares and financial performance. This talk is based on joint work (https://arxiv.org/abs/1909.00257) with Yasuaki Hiraoka, Mitsuru Igami, and Yasin Ozcan.

Mapper Tutorial

Abstract: Mapper, one of the tools of TDA, is able to compactly summarize complicated high-dimensional data as graphs. In this tutorial, an elementary introduction to Mapper computations will be provided, starting from simple examples to build intuition before proceeeding to more complicated examples.

The code for the tutorial is available from:

https://github.com/emerson-escolar/mapper_tutorial

(NOTE: up to the day of the tutorial, the contents may be updated/changed)
Follow the instructions on the webpage for the installation.


Shizuo Kaji

(Kyushu University)

Tutorial on CubicalRipser and other TDA software using Python

Abstract: Techniques in Topological Data Analysis are vast, and there are many software packages that provide various functionalities. Oftentimes, it is not easy to find the right tool for the data and the task at hand. We will give hands-on tutorials on several software packages for use with Python on Google Colab (a Jupyter notebook is available here). Various tasks, including classification, regression, clustering, and visualisation, on multiple data types, including point cloud, image, volumetric data, time series, and graph, are covered. In particular, we will introduce our Cubical Ripser, a fast program for computing persistent homology of images and volumes (cubical complexes).

Mathieu Carrière

(INRIA)

Topological analysis of immunofluorescence images

Abstract: Persistent homology is a common tool of topological data analysis, which aims at computing and encoding the geometry and topology of given datasets. In this talk, I will present a novel application of persistent homology to characterize the spatial arrangement of immune and tumor cells in the context of breast cancer. More specifically, quantitative and robust characterizations are built by computing (multiparameter) persistent homology out of a staining technique (called quantitative multiplex immunofluorescence) which allows to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence modules are then converted into descriptors (persistence diagrams for scalar filtrations, multiparameter persistence images for multiparameter filtrations) and evaluated as characteristic biomarkers of cancer subtype and overall survival. This provides new insights and possibilities on the general problem of building (topology-based) biomarkers for immune responses.

Kelin Xia

(Nanyang Technical University)

Topological data analysis based machine learning for drug design

Abstract: Effective molecular representation is key to the success of machine learning models for molecular data analysis. TDA-based featurization and feature engineering have demonstrated great power in structure representations . In this talk, we will discuss a series of persistent models, including persistent homology, persistent spectral models, and persistent Ricci curvature and their combination with machine learning models. Unlike traditional graph and network models, these filtration-induced persistent models can characterize the multiscale topological and geometric information, at the same time significantly reduce molecular data complexity and dimensionality. Features are obtained from various persistent attributes derived from mathematical invariants, such as homology, cohomology, eigenvalues, and Ricci curvature. They are combined with learning models, in particular, random forest, gradient boosting tree and convolutional neural network. Our persistent representation based molecular fingerprints can significantly boost the performance of learning models in drug design.

Moo K. Chung

(University of Wisconsin - Madison)

Research talk: Lattice Paths for Persistent Diagrams Applied to COVID-19 Virus Spike Protein Structures

Abstract: Topological data analysis, including persistent homology, has undergone significant development in recent years. However, one outstanding challenge is to build a coherent statistical inference procedure on persistent diagrams. The paired dependent data structure as birth and death in persistent diagrams adds additional complexity to developing a coherent statistical inference procedure. In this paper, we present a novel data representation that transforms persistent diagrams as lattice paths. A new exact statistical inference procedure is developed over the collection of lattice paths via combinatorial enumerations. The lattice path method is applied to the topological features of the protein structures of corona viruses. The proposed method demonstrates that there are topological changes during the conformational change of spike proteins that are need to infect host cells. The talk is based on arXiv:2105.00351.

Tutorial on Wasserstein Distance on Graphs

Abstract: The first part of the tutorial (MATLAB based) introduces the graph filtration, the baseline filtration on graph data structure. The second part of the tutorial explains how to compute the Wasserstein distance on graphs without doing neumrical optimation that is needed to find the optimal bijection. The tutorial is based on arXiv:2012.00675.

Tomoki Uda

(Tohoku University)

Stability of Reeb Trees and Application to Noisy Images

Abstract: Reeb graphs are one of the mathematical tools to summarize real-valued functions, as well as merge trees in TDA. In our recent work, we proposed a mathematical formulation that allows us to compute Reeb trees (Reeb graphs without cycles) from merge trees of given scalar data, which also leads to a computational algorithm. Furthermore, the algorithm is stable in an interleaving extended distance. In this talk we will present some of the stability results and demonstrate application to noisy gray-scale images.

Yuan Wang

(University of South Carolina)

Tutorial on Topological Signal Processing and Inference with EEG Applications

Abstract: This tutorial (MATLAB based) is aimed at introducing the topic of topological signal processing with statistical inference on persistent landscapes. The first part of the tutorial covers a computational framework for decoding topological features of signals in the temporal domain. The second part of the tutorial covers statistical inference procedures on the topological features, as well as applications of the framework to electroencephalography (EEG). The tutorial is based on the linked paper.

Mai Lan Tran

(POSTECH)

Topological Data Analysis of Korean Music in Jeongganbo

Abstract: Jeongganbo is a unique Korean music representation invented by Sejong the Great. We use topological data analysis to analyze the Korean music written in Jeongganbo for Suyeonjang, Songuyeo, and Taryong, those well-known pieces played at the palace and among noble community. We observe that the cycles of Suyeonjang and Songuyeo, categorized as a special type of cyclic music known as Dodeuri, frequently overlap each other when appearing in the music while the cycles found in Taryong, which does not belong to Dodeuri class, appear individually. The overlap pattern is then used to create new music which turn out to sound like Korean traditional music in Jeongganbo. This is joint work (https://arxiv.org/abs/2103.06620) with Jae-Hun Jung and Changbom Park.

Umberto Lupo

(EPFL)

Wojciech Reise

(Université Paris-Saclay & INRIA)

giotto-tda tutorial: machine learning pipelines with persistent homology and Mapper

Abstract: giotto-tda is a Python library that integrates high-performance topological data analysis with machine learning via a scikit-learn–compatible API and state-of-the-art C++ implementations.

Its large selection of preprocessing techniques, of persistent homology algorithms, and of featurization methods for persistence diagrams, allows for the flexible creation and tuning of end-to-end topological machine learning pipelines for various types of data (e.g. point clouds, graphs, images, and time series).

The Mapper algorithm is implemented in giotto-tda as a scikit-learn pipeline with a parallelized clustering step. Furthermore, an interactive plotting API allows one to tune Mapper’s hyperparameters and observe how the resulting graph changes in real time.

In this tutorial, we will illustrate some of these functionalities by a) showing how to create pipelines for time series classification using the time-delay embedding technique, and b) showcasing the library’s generic and extensible Mapper implementation. Source code: https://github.com/giotto-ai/giotto-tda.