The 4th POSTECH MINDS Workshop on


Topological Data Analysis 

and Machine Learning


Jan.  30 (Tuesday) ~ Feb. 2 (Friday), 2024

GMT+9, Korea, Hybrid


Workshop Rationale

Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. Recently, much TDA research has been devoted to developing TDA that is compatible with machine learning workflow. This workshop will bring together researchers and students working on TDA and machine learning and provide an opportunity for them to present their recent research and share ideas. Further, this workshop will also provide tutorial sessions that will introduce various TDA computational tools and offer practical hands-on tutorials.  This is a sequel to the workshops of the same name held in 2021, 2022 and 2023.

Venue (hybrid)

Registration

Financial support

Financial Supported Accommodation Information

Pohang University of Science and Technology International Hall

77, Cheongam-ro, Nam-gu, Pohang-si, Gyeongsangbuk-do 

Yeongildae Hotel

11, Hengbokgil 75beon-gil, Nam-gu, Pohang-si, Gyeongsangbuk-do 

Organizers

Administrations 

Program Overview (GMT+9)

Day 1 (Jan 30, Tue)

Afternoon session
(chair: Shizuo Kaji)










Day 2 (Jan 31, Wed)

Morning session (Student Session)
(chair: Graiff Zurita Sebastían Elías)

Afternoon session (Mathematics Building Rm 402)
(chair: Killian Meehan)

Day 3 (Feb 1, Thu)

Morning session
(chair: D. Vijai Anand)

Afternoon session
(chair: Woojin Kim)


Special Concert 

(TDA & AI driven Korean Music)

Conference Banquet  (Grand Ballroom, POSCO Inernational Cemter, 2nd floor)

Day 4(Feb 2, Fri)

Morning session
(chair: Jae-Hun Jung)

Afternoon session


Conference Lunch: Blue Hill (2nd floor of POSCO International Center)

Tuesday,Wednesday dinner: Individual dinner. Dinner coupon is provided which can be used at the campus dining cafeterias including Student dinning hall, Geuyeodeun(그여든). 

(tentative!) Workshop excursion: Pohang Accelerator Laboratory; PAL 


Workshop excursion: Gyeongju Cheonmachong Tomb

Confirmed invited speakers 

Speakers

Jose Perea

(North Eastern)

Topological Time Series Analysis - Plenary Talk 


Abstract: 

Time series are ubiquitous in scientific applications and machine learning tasks such as classification, recurrence quantification, data imputation and anomaly detection. In this talk, I will describe how tools from algebraic topology, analysis and dynamical systems can be leveraged to answer some of these questions, and in particular, their relevance in machine learning applications.

D. Vijai Anand

(University of College London)

Hodge-Decomposition of Brain Networks


Abstract:

Growing evidence suggests that complex brain functions and disorders are closely related to higher-order interactions. Despite this, conventional graph-based brain network models often find it challenging to in-corporate such polyadic interactions in a biologically meaningful way. We propose to incorporate higher-order interactions through simplical complexes. The Hodge decomposition serves as an invaluable tool for this purpose,

enabling the breakdown of edge-encoded brain network data into gradient, curl, and harmonic components. The core idea is to utilize the Hodge decomposition to divide the edge-encoded brain network data into three orthogonal subnetworks. We introduce a measure to quantify the magnitude and relative strength of these components.

To validate our methodology, we conduct simulation studies on model networks and employ statistical inference procedures based on the Wasserstein distance. Tests on human brain networks from resting-state functional magnetic resonance imaging datasets reveal that the Hodge decomposition components contain topological features that exhibit statistically significant differences between male and female brain networks. 

Woojin Kim

(KAIST)

Persistence Diagrams at the Crossroads of Algebra and Combinatorics


Abstract:

Persistent Homology (PH) is a method used in Topological Data Analysis (TDA) to extract multiscale topological features from data. Via PH, the multiscale topological features of a given dataset are encoded into a persistence module (indexed by a totally ordered set) and in turn, summarized by a persistence diagram. In order to extend PH so as to be able to study wider types of data (e.g. time-varying point clouds), variations of the indexing set of persistence modules must inevitably occur, leading for example to multiparameter persistence modules, i.e. persistence modules indexed by the n-dimensional grid. It is however not always evident how to define a notion of persistence diagram for such variants. This talk will introduce a generalized notion of persistence diagram for such variants which arises through exploiting both the principle of inclusion and exclusion from combinatorics and the canonical map from the limit to the colimit of a diagram of vector spaces (these being notions from category theory). We also discuss (1) how the

generalized persistence diagram subsumes some other well-known invariants of multiparameter persistence modules and (2) algorithmic considerations for computing the generalized persistence diagram.

Matias de Jong van Lier

(Kyushu University)

Topological Smoothing of a Signal over a Planar Graph


Abstract:

Persistent homology furnishes us with crucial topological insights into a dataset, where each topological feature is represented by an interval, and the duration of this interval is referred to as the feature's lifetime. Features with short lifetimes are typically classified as noise, while those with long lifetimes are recognized as meaningful attributes of the dataset.

Exploiting this perspective, we introduce a novel method for the topological simplification of functions defined over the vertices of planar graphs. This method involves fine-tuning the function's values to eliminate features with lifetime smaller than a specified threshold, resulting in a refined function that retains the topologically significant aspects of the original function while discarding the noise.

Formally, this process can be viewed as an approximation problem for a given function by the class of functions without cycles whose persistence is shorter than a specified threshold.

We demonstrate the efficacy of our method through practical applications using images, treating them as signals defined over 2D grid graphs.

This is joint work with J. Chu, S. E. Graiff Zurita, and S. Kaji.

Sicheng Luo

(Kyushu University)

Understanding Changes in STRIPS Using Reeb Graph Simplification


Abstract:

The Stanford Research Institute Problem Solver (STRIPS), widely employed in various domains, notably video games, is an artificial intelligence method empowering agents to plan actions to achieve predefined objectives dynamically. However, understanding changes in action costs, done for tuning the chain of actions to be employed, poses a considerable challenge. To address this issue, we propose the adoption of the Reeb graph to summarize the effects resulting from changes in action costs topologically.

Keunsu Kim

(POSTECH)

Variation principle in Exact Multi-parameter Persistent Homology and its application


Abstract: 

Exact Multi-parameter Persistent Homology (EMPH) represents a one-dimensional reduction of the multi-parameter persistent homology theory applied to the Liouville torus of time-series data. The barcode formula associated with this persistent homology on a ray in the multi-parameter space provides insights into how we can extract information from time-series data. In this talk, we explores the variation principle in EMPH and its application. We introduces an optimization method for the objective function in EMPH and explores the process of finding a ray that minimizes a given loss function. Gradient descent is employed to update the ray during each epoch, requiring the introduction of a smoothing structure into the update process.

Eunwoo Heo

(POSTECH)

Persistent homology of featured time series data and its applications


Abstract: 

Constructing graphs based on the frequency of occurrence in time series observations is a straightforward method that effectively reflects information of time series data. However, this simplicity often leads to significant information loss during the data transformation process. We introduce a new concept and research methodology that preserves the advantages of graph transformations using frequency, while controlling for information loss. Featured time series refers to time series that are enhanced with features to mitigate information loss. We prove a theorem demonstrating that an influence vector, which represents the impact of features, maintains stability properties throughout the analysis process. Based on this theorem, we introduce a research methodology focused on finding influence vectors that minimize information loss for improved time series analysis. We sought influence vectors that effectively detect anomalies in stock data, such as the Lehman bankruptcy and the dotcom crash. Conversely, we also present a methodology that sets the desired analysis target as a feature. We analyzed the impact of musical notation on overall time series data by observing changes in the influence vector.

BoGwang Jeon

(POSTECH)

A crash course in algebraic topology


Abstract

This talk is largely expository and a crash course in algebraic topology. In the lecture, I will explain the basic concepts and theorems of homology, as well as their developments. If time permits, then I will also try to introduce persistent homology.

Enhao Liu

(Kyoto University)

Interval Replacements of Persistence Modules


Abstract: 

Persistent homology is a powerful technique in topological data analysis, employed to explore topological features of complex datasets. Multi-parameter persistent homology is less understood than one-parameter persistent homology because of its more intricate algebraic structure. Specifically, the presence of (usually infinitely many) non-interval indecomposable persistence modules significantly complicates the analysis in the multi-parameter persistent homology. To facilitate practical data analysis, we have developed a general theory of replacing any persistence modules over a finite poset with a pair of interval-decomposable modules (that is regarded as an element of the split Grothendieck group).

In this talk, I will begin by introducing some fundamental concepts and basic knowledge of Möbius inversion. Following this, I will give the definitions of the interval replacement and a new invariant of the persistence module called the interval rank invariant. I proceed to show our main theorem stating that the interval replacement preserves the interval rank invariants of all persistence modules. Next, I will reveal computational results specific to certain (2,1), (n,1), and (2,2)-type intervals. Finally, I will provide some examples explaining the interval replacements and the main theorem, together with an example showing the incompleteness of the interval rank invariant.

This is a joint work with Hideto Asashiba and Etienne Gauthier.

Shizuo Kaji

(Kyushu University)

Computing Persistent Homology of Volumetric Images


Abstract: Persistent homology (PH) serves as a powerful tool for image feature extraction. We present Cubical Ripser, an open-source software designed for high-efficiency computation of persistent homology of cubical complexes. We will illustrate how Cubical Ripser can be integrated into a standard image analysis pipeline.
Hands-on Jupyter Notebook (Google Colab)

Yulia Gel

(University of Texas Dallas)

Coupling Time-Aware Multipersistence Knowledge Representation with Graph Convolutional Networks for Time Series Forecasting 


Abstract: Graph Neural Networks (GNNs) are proven to be a powerful machinery for learning  complex dependencies in multivariate spatio-temporal processes. However, most existing GNNs have inherently static architectures, and as a result, do not explicitly account for time dependencies of the encoded knowledge and are limited in their ability to simultaneously infer latent time-conditioned relations among entities.

We postulate that such hidden time-conditioned properties may be captured by the tools of multipersistence, i.e., an emerging machinery in topological data analysis which allows us to quantify dynamics of the data shape along multiple geometric dimensions. We propose to summarize inherent time-conditioned topological properties of the data as time-aware multipersistence Euler-Poincar\'e surface and prove its stability. We then construct a supragraph convolution module which simultaneously accounts for the extracted intra- and inter-dependencies in the data.  We illustrate the utility of the proposed approach in application to forecasting highway traffic flow, blockchain Ethereum token prices, and COVID-19 hospitalizations. 

Killian Meehan

(Kyoto University)

Topologically Learned Embeddings and Applications to Chromosome Structural Analysis


Abstract:
Inspired by the yet-untapped potential of multi-contact chromatin capture data (MC-3C), we have developed a machine learning network which corrects for the most obvious failing in the current embedding method: the destruction of topological information. This project was motivated from the biological question of how to best handle MC-3C data, and in the quest to answer that our work has yielded novel research results for the fields of mathematics and machine learning.


Beomjun Choi

(POSTECH)

Arnold-Thom conjecture for gradient flow


Abstract:
We discuss the asymptotic behavior of solutions to evolution equations with gradient-like structure. Originating from the study of gradient flows on Euclidean space, the uniqueness questions on limits and limiting directions have long history and many questions are still open.  Our main result characterizes the rate and the direction of convergence for slowly converging solutions. This partially confirms the Thom/Arnold’s gradient conjecture in the context of infinite-dimensional problems. We will also discuss possible applications.

Graiff Zurita Sebastían Elías

(Kyushu University)

Analyzing Representational Capacity: Determinantal Point Processes with Symmetric vs Non-Symmetric Kernels


Abstract

A Determinantal Point Process (DPP) is a probabilistic model over subsets of a ground set of N items, that is parameterized by an N-times-N matrix (the kernel). Specifically, the probability associated with each subset is directly proportional to the determinant of the submatrix indexed by that particular subset. In machine learning, DPPs have gained attraction for being simple models that can be used in recommendation algorithms and for data summarization, among other applications.

Initially, only positive semi-definite kernels were considered, resulting in DPPs that exhibit a repulsive nature—implying that the presence of one item suppresses the likelihood of others. However, it is also possible to use non-symmetric kernels which relax the repulsiveness in specific scenarios.

In practice, we want to model a dataset by DPPs, and we may choose to do so by using symmetric or non-symmetric kernels. Each model has a different number of free parameters, and we propose to analyze them by taking this into account. In this presentation, we show quantitative findings comparing the representational capacity in terms of the degree of freedom through the use of symmetric and non-symmetric kernels.

Jisu Kim

(Seoul National University)

Featurization and Evaluation using Topological Data Analysis


Abstract: Topological Data Analysis (TDA) involves generally refers to utilizing topological features from data. One main focus in TDA is persistent homology, which observes data at various resolutions and summarizes topological features that persistently appear. TDA has been proven valuable in enhancing machine learning applications. This presentation focuses on the application of TDA in machine learning, specifically in two aspects: featurization and evaluation.

The intricate structure of persistent homology poses challenges when directly applied to statistical or machine learning frameworks. To overcome this, the persistent homology is often featurized in Euclidean space or functional space. Two papers will be discussed as examples. First, I will present ”PLLay: Efficient Topological Layer based on Persistence Landscapes”, where I will explain how persistence landscapes are used to create a topological layer in a deep learning framework. Then, I will present ”Generalized Penalty for Circular Coordinate Representation”, discussing how circular coordinates are utilized for visualization and dimension reduction.

Recently, efforts have emerged in using TDA to evaluate data or models and integrate them into machine learning models. I will present "TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models", where the confidence of TDA is employed for robust and reliable evaluation metrics for generative models.

Jongbaek Song


 (Pusan National University)

Moment-angle complexes and persistent cohomology


Abstract: In the standard pipeline of TDA, given a data set X, one constructs certain filtrations of simplicial complexes such as Vietoris—Rips complexes or Cech complexes. Then the homology functor is applied, which gives us the persistent homology of X. In fact, in algebraic topology, taking the cohomology functor, hence the persistent cohomology, provides a richer algebraic structure compared to the usual persistent homology. Taking things further, one can consider certain resolutions of the face rings of simplicial complexes in the filtration to apply the Tor-functor, consequently   defining the persistent Tor-algebra. It has much richer algebraic structures than the usual persistent cohomology. In this talk, we introduce the notion of a moment-angle complex which serves as topological counterpart to the Tor-algebra of the simplicial complex, then we discuss the relevant persistent modules and their stabilities. 

Abstract

Junyan Chu

(Kyushu University)

Polynomial Interpolation of a Vector Field on a Convex Polygon


Abstract: 

Interpolation of sparsely observed data is a fundamental aspect of data science.

We consider fluidic materials or particles confined within a convex polygonal domain $P$.

The dynamics of these entities is modelled by a vector field which is tangent to the polygonal boundary $\partial P$.

In this work, we introduce a novel scheme for interpolating a vector field on a convex polygonal domain by a polynomial vector field, leveraging the ideas from the theory of hyperplane arrangement. Given a degree upper bound $k$ and a set of observations $\{ \left( (x_i,y_i),u(x_i,y_i) \right)\mid (x_i,y_i)\in P, u(x_i,y_i)\in R^2, i=1,\ldots, n\}$, our algorithm computes a degree $k$ polynomial vector field $u: P \to R^2$ that interpolates the observations in a least squares sense while satisfying the boundary condition exactly. The algorithm can also calculate a minimal degree polynomial vector field that meets a given error bound. We showcase the effectiveness of our algorithm through applications in vector field design.

Yuzhou Chen

(Temple University)

Title: Zigzag Persistence and Graph Convolutional Networks for Time Series Forecasting


Abstract: There recently has been a surge of interest in developing a new class of deep learning (DL) architectures that integrate an explicit time dimension as a fundamental building block of learning and representation mechanisms. In turn, many recent results show that topological descriptors of the observed data, encoding information on the shape of the dataset in a topological space at different scales, that is, persistent homology of the data, may contain important complementary information, improving both performance and robustness of DL. As convergence of these two emerging ideas, in this work, we propose to enhance DL architectures with the most salient time-conditioned topological information of the data and introduce the concept of zigzag persistence into time-aware graph convolutional networks (GCNs). Zigzag persistence provides a systematic and mathematically rigorous framework to track the most important topological features of the observed data that tend to manifest themselves over time. To integrate the extracted time-conditioned topological descriptors into DL, we develop a new topological summary, zigzag persistence image, and derive its theoretical stability guarantees.  We validate the new GCN-based model, i.e., Z-GCNETs with a time-aware zigzag topological layer, in application to traffic forecasting and Ethereum blockchain price prediction. The experimental results indicate a generally higher performance of Z-GCNETs compared with related approaches.

Katharine Turner

(Australian National University)

The Extended Persistent Homology Transform for Manifolds with Boundary 


Abstract

The Persistent Homology Transform (PHT) is a topological transform which can be use to quantify the difference between subsets of Euclidean space. To each unit vector the transform assigns the persistence module of the height function over that shape with respect to that direction. The PHT is injective on piecewise-linear subsets of Euclidean space, and it has been  demonstrably useful in diverse applications.  One shortcoming is that shapes with different essential homology (i.e., Betti numbers) have an infinite distance between them. 


The theory of extended persistence for Morse functions on a manifold was developed by Cohen-Steiner, Edelsbrunner and Harer in 2009 to quantify the support of the essential homology classes. By using extended persistence modules of height functions over a shape, we obtain the extended persistent homology transform (XPHT) which provides a finite distance between shapes even when they have different Betti numbers.


I will discuss how the XPHT of a manifold with boundary can be deduced from the XPHT of the boundary which allows for efficient calculation. This work is with Vanessa Robins and James Morgan. 

Sunhyuk Lim 

(Sungkyunkwan University)

Vietoris-Rips Persistent Homology, Injective Metric Spaces, and The Filling Radius


Abstract: 

In the applied algebraic topology community, the persistent homology induced by the Vietoris-Rips simplicial filtration is a standard method for capturing topological information from metric spaces. In this paper, we consider a different, more geometric way of generating persistent homology of metric spaces which arises by first embedding a given metric space into a larger space and then considering thickenings of the original space inside this ambient metric space. In the course of doing this, we construct an appropriate category for studying this notion of persistent homology and show that, in a category theoretic sense, the standard persistent homology of the Vietoris-Rips filtration is isomorphic to our geometric persistent homology provided that the ambient metric space satisfies a property called injectivity.

As an application of this isomorphism result we are able to precisely characterize the type of intervals that appear in the persistence barcodes of the Vietoris-Rips filtration of any compact metric space and also to give succinct proofs of the characterization of the persistent homology of products and metric gluings of metric spaces. Our results also permit proving several bounds on the length of intervals in the Vietoris-Rips barcode by other metric invariants. Finally, as another application, we connect this geometric persistent homology to the notion of filling radius of manifolds introduced by Gromov \cite{G07} and show some consequences related to (1) the homotopy type of the Vietoris-Rips complexes of spheres which follow from work of M.~Katz and (2) characterization (rigidity) results for spheres in terms of their Vietoris-Rips persistence barcodes which follow from work of F.~Wilhelm.


Song, Youngsook 

Yoon, Hose

Kim, Myung Ock

Special music concert: TDA-AI driven Korean music 


A special music concert will take place on Thursday at 6:00 pm before the conference banquet. This concert showcases Korean music pieces performed on traditional Korean instruments, all generated using TDA and AI. Attendees of the TDA conference are likely to find this concert particularly intriguing. Renowned Gayageum player Song, Youngsookk, accompanied by Yoon, Hose, will be featured in the performance. The final piece is a composition by Dr. Kim, Myong Ock who used music generated through TDA and AI as a seed music. This promises to be an exceptional blend of human-AI music.

PROGRAM


Zoom links 

Zoom Link: 

https://us06web.zoom.us/j/6888961076?pwd=ejYxN05jNmhUa25PU2JzSUJvQ1haQT09


Meeting ID: 688 896 1076
Password: 54321

Local information 

How to get to POSTECH - Directions 

Campus map - campus map 

Campus hotel - campus hotel (POSCO International Center) 

Yeongildae Hotel - https://www.yeongildae.com/

Photo of the Workshop

Contact 

For comments or questions, please contact

Sponsors