Topological Data Analysis and its Applications for Medical Data

In conjunction with the 24th International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI 2021), September 27 - October 1, 2021 / Strasbourg, FRANCE


Spot Light

Xiang Liu and Kelin Xia. Neighborhood complex based machine learning (NCML) models for drug design


The importance of drug design cannot be overemphasized. Recently, artificial intelligence (AI) based drug design has begun to gain momentum due to the great advancement in experimental data, computational power and learning models. However, a major issue remains for all AI-based learning models is efficient molecular representations. Here we propose Neighborhood complex (NC) based molecular featurization (or feature engineering), for the first time. In particular, we reveal deep connections between NC and Dowker complex (DC) for molecular interaction based bipartite graphs, for the first time. Further, NC-based persistent spectral models are developed and the associated persistent attributes are used as molecular descriptors or fingerprints. To test our models, we consider protein-ligand binding affinity prediction. Our NC based machine learning (NCML) models, in particular, NC-based gradient boosting tree (NC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-v2007, PDBbind-v2013 and PDBbindv2016, and extensively compared with other existing state-of-the-art models. It has been found that our NCML models can achieve state of-the-art results




Accepted Papers:




Yuki Saeki, Atsushi Saito, Jean Cousty, Yukiko Kenmochi and Akinobu Shimizu. Statistical modeling of pulmonary vasculatures with topological priors in CT volumes


A statistical appearance model of blood vessels based on variational autoencoder (VAE) is well adapted to image intensity variations. However, images reconstructed with such a statistical model may have topological defects, such as loss of bifurcation and creation of undesired hole. In order to build a 3D anatomical model of blood vessels with topological correctness, we incorporate topological prior into the statistical modeling. Qualitative and quantitative results on 2567 real CT volume patches and on 10000 artificial ones show the efficiency of the proposed framework.





Moo K. Chung and Hernando Ombao. Lattice Paths for Persistent Diagrams with Application to COVID-19 Virus Spike Proteins

Persistent homology has undergone significant development in recent years. However, one outstanding challenge is to build a coherent statistical inference procedure on persistent diagrams. In this talk, we first present a new lattice path representation for persistent diagrams. We then develop a new exact statistical inference procedure for lattice paths via combinatorial enumerations. The lattice path method is applied to the topological characterization of the protein structures of the COVID-19 virus. We demonstrate that there are topological changes during the conformational change of spike proteins.




Paula Martin-Gonzalez, Mireia Crispin-Ortuzar and Florian Markowetz. Predictive modelling of highly multiplexed tumour tissue images by graph neural networks


The progression and treatment response of cancer largely depends on the complex tissue structure that surrounds cancer cells in a tumour, known as the tumour microenvironment (TME). Recent technical advances have led to the development of highly multiplexed imaging techniques such as Imaging Mass Cytometry (IMC), which capture the complexity of the TME by producing spatial tissue maps of dozens of proteins. Combining these multidimensional cell phenotypes with their spatial organization to predict clinically relevant information is a challenging computational task and so far no method has addressed it directly. Here, we propose and evaluate MULTIPLAI, a novel framework to predict clinical biomarkers from IMC data. The method relies on attention-based graph neural networks (GNNs) that integrate both the phenotypic and spatial dimensions of IMC images. In this proof-ofconcept study we used MULTIPLAI to predict oestrogen receptor (ER) status, a key clinical variable for breast cancer patients. We trained different architectures of our framework on 240 samples and benchmarked against graph learning via graph kernels. Propagation Attribute graph kernels achieved a class-balanced accuracy of 66.18% in the development set (N=104) while GNNs achieved a class-balanced accuracy of 90.00% on the same set when using the best combination of graph convolution and pooling layers. We further validated this architecture in internal (N=112) and external test sets from different institutions (N=281 and N=350), demonstrating the generalizability of the method. Our results suggest that MULTIPLAI captures important TME features with clinical importance. This is the first application of GNNs to this type of data and opens up new opportunities for predictive modelling of highly multiplexed images.






Ameer Saadat-Yazdi, Rayna Andreeva and Rik Sarkar. Topological Detection of Alzheimer's Disease using Betti Curves


Alzheimer’s disease is a debilitating disease in the elderly, and is an increasing burden to the society due to an aging population. In this paper, we apply topological data analysis to structural MRI scans of the brain, and show that topological invariants make accurate predictors for Alzheimer’s. Using the construct of Betti Curves, we first show that topology is a good predictor of Age. Then we develop an approach to factor out the topological signature of age from Betti curves, and thus obtain accurate detection of Alzheimer’s disease. Experimental results show that topological features used with standard classifiers perform comparably to recently developed convolutional neural networks. These results imply that topology is a major aspect of structural changes due to aging and Alzheimer’s. We expect this relation will generate further insights for both early detection and better understanding of the disease.