# Topological Data Analysis

Spring' 2022, Tue/Tue: 2:00pm -- 3:20pm

Location: HSS 2154

Course Description

Topological data analysis: algorithm and applications

Topological methods provide powerful tools for characterizing structures/features behind data and analyzing diverse complex data (images, graphs, point sets, etc). This course introduces basic concepts and topological structures, as well as recent topological tools, algorithms, as well as examples to applications. Some topics include: basics in topology, simplicial complexes to model data, persistent homology, discrete Morse theory, topology inference, the Mapper methodology, hierarchical clustering, and integration of topological methods with machine learning. A tentative list of topics is on the right.

This course is for graduate students and advanced undergraduate students.

Prerequisite: Linear algebra, algorithms (DSC40B or equivalent)

Instructor:

Yusu Wang, Email: yusuwang@ucsd.edu

Class time: Sp'2022, Tue/Thu 2:00pm -- 3:20pm PDT.

Office Hour: Thu 9am -- 9:50am PDT (SDSC 212E, or on zoom (see Canvas for zoom link)), or by appointment (yusuwang@ucsd.edu ).

Reference books:

• Much of the material is from a up-coming book Computational Topology for Data Analysis co-authored by Dr. T. K. Dey and myself, and you can download an e-version of the book here (for academic use), or on my website.

• I will distribute (in Canvas) class notes and slides throughout the course.

Other references that might be useful:

• Computational Topology: An Introduction, by H. Edelsbrunner and J. Harer, AMS Press, 2009.

• Algebraic Topology, by A. Hatcher, Cambridge University Press, 2002. (Online version is available at the author's webpage.)

• An Introduction to Morse Theory, by Y. Matsumoto, Amer. Math. Soc., Providence, Rhode Island, 2002.

• Elements of Algebraic Topology, by J. R. Munkres, Perseus, Cambridge, Massachusetts, 1984.

• There is one project. The final grade is based on: Project / survey (proposal, project, report, short presentation): 100%.

• Project requirement: Each student is required to meet briefly with me around 3-4th week, so that we can discuss to find a good potential topic for your final project. After a topic is chosen, you are required to submit a short proposal about what you intend to work on. Later, after you finish your project, you will submit a final report (4+ pages, single spacing), which should clearly describe the motivation of your project, what you have done, and your findings. A 5-min presentation is also needed. A survey needs to be 10+ pages (single spacing).

• Timeline:

• Week 2-5: Students meet with me to discuss project topic .

• Week 6: Selection of final project topic done. Project proposal due by end of May 8th, 2022 (Sunday) (extended from previous May 6th deadline).

• Week 10: Finish up your final project

• Final week:

• Thursday June 9th by 11:59pm : Final project report as well as your recorded presentation are due (there is no late submission accepted as I need to assign final grades). Please submit the video to the google drive I shared with you (information will be available in canvas then), and send the report to my email.

Lecture materials: (Lecture notes are distributed in canvas. Note that all animations are lost in pdf slides. )

1. Topic 0: Introduction to computational topology (slides: pptx)

2. Topic 1: Basics. (slides: pptx , pdf; Optional reading: Chapter 1 of CTDA textbook.)

3. Topic 2: Simplicial complex (aka: how do we model space of interest) (slides: pptx, pdf; Optional reading: First half of Chapter 2 of CTDA textbook)

4. Topic 3: Homology (aka: how do we quantify topological information) (slides: pptx, pdf; Optional reading: Second half of Chapter 2 of CTDA textbook)

5. Topic 4: Persistent homology (aka: a modern much more powerful extension of homology)

• 4-A: Introduction to persistent homology (slides: pdf; Optional reading: Chapter 3.1--3.3 of CTDA textbook)

• 4-B: Persistent homology of PCDs and functions (slides: pdf; Optional reading: Chapter 3.4-3.5 of CTDA textbook)

• 4-C: More on persistent modules: distance and extensions (slides: pdf; Optional reading: Chapter 4 of CTDA textbook)

6. Topic 5: Using persistent homology in practice (slides: pdf; Optional reading: Chapter 6 and Chapter 8 of CTDA textbook)

7. Topic 6: Analysis of functions on Data I: Reeb graphs and contour trees (slides: pdf; Optional reading: Chapter 7 of CTDA textbook)

8. Topic 7: Analysis of functions on Data II: Mapper and multiscale-Mapper (slides: pdf; Optional reading: Chapter 9 of CTDA textbook)

9. Topic 8: Analysis of functions on Data III: Discrete Morse theory and applications (slides: pdf; Optional reading: Chapter 10 of CTDA textbook)

10. Topic 9: Homology inference, data sparsification and denoising (slides: pdf; Optional reading: Chapter 6 of CTDA textbook)

11. Topics 10: TDA + Machine Learning (slides: pdf; Optional reading: Chapter 13 of CTDA textbook)

Other course materials: (Optional assignments, potential project topics, announcement etc. )

1. See Canvas for a list of potential course project topics.

Software / Tools resources

• Computing Persistent homology:

1. The Ripser library is available . It contains a memory+time efficient algorithm for computing persistent homology for Rips filtration.

2. The Gudhi library is available . This includes currently the most efficient algorithm for computing persistent homology for arbitrary simplicial complex. The references for the algorithm is available on the Gudhi webpage.

3. The PHAT library is available . This algorithm is also efficient, and the algorithm is closer to the Gaussian elimination we described in class, thus potentially easier to play with if you need to modify the code. The references for the code is available on the website. There is also a parallel version DIPHA for effecient computation of PH on distributed systems, which is available here.

4. The Dionysus library is one of the most comprehensive packages, which contains many persistence related computation, including levelset zigzag persistence. It is available here. It is written in C++ and reasonably efficient.

5. The JavaPles library is a Matlab and Java-based system. It is available . It can automatically compute Vietoris-Rips, witness and lazy-witness complexes from input finite metric spaces (e.g point clouds data in a feature space) to further build persistent homology. The webpage also contains tutorial etc.

6. The SimPers package is specifically for persistence of a filtration connected by simplicial maps. It is available .

7. The Sophia package is also specifically for persistence of a filtration connected by simplicial maps. Its practical performance can be better than SimPers. It is available .

8. The scikit-tda package (available here) is a python library that can compute persistent homology induced from points data, as well as other tools such as visualization of persistence diagrams.

• Topological data analysis (TDA) and machine learning

1. The sklearn-tda python package (available here) by Mathieu Carriere is a package combing TDA with machine learning (including most kernels for persistence diagrams)

2. The TopoLayer python package (available here) by Rickard Bruel Gabrielsson implements a topological layer that can be inserted in a neural network

• Computing Bottleneck / Wasserstein distance between persistence diagrams:

1. The Hera package is available here. It computes the bottleneck or Wasserstein distances between two persistence diagrams efficiently.

• Sparsification / approximation:

1. The SimBa package is available . It produces approximate persistence diagrams for Rips filtration of a set of points in the Euclidean space or in general metric space. It is originally developed by Dayu Shi, working with Dr. Tamal Dey and myself, and currently maintained by Sayan Mandal.

2. The Sophia package is available here. It produces approximate persistence diagrams for Rips filtration of a set of points in the Euclidean space.

• Graphs, Reeb graph / contour tree related:

1. The code to compute persistence diagrams for a function defined on a graph can be downloaded here. The code to compute the persistence-distortion distance for comparing two metric graphs can be downloaded here. Both are developed by Dayu Shi, working with Dr. Tamal Dey and myself.

2. The Denali software package for visualizing terrain metaphor is available here. The package includes code to compute contour trees for a scalar field.

3. The randomized algorithm for computing Reeb graph for arbitrary simplicial complex can be downloaded here.

• Mapper:

1. The Python Mapper package is available . It is developed by Daniel Muller.

• Discrete Morse complex:

1. The graph reconstruction based on discrete Morse complex is available here. It takes arbitrary simplicial complex + a vertex-wise function, or a 2D / 3D cubic grid with vertex-wise function, as input. Examples for road network reconstructions are included.

2. The DisPerse package is available . The software only takes 2D or 3D volumetric data, or Delaunay triangulations.