Securing sufficient fundings is an important step in conducting research and the European Union is one of the most ambitious resources. The Horizon Europe EU funding programme provides funding opportunities in different research domains, thanks to its structure into Clusters: healthcare research, Artificial Intelligence (AI), bioeconomy, civil and cyber security, agriculture, climate, energy, mobility, to name a few. Topological data analysis (TDA) is a versatile mathematical approach which can be applied to virtually all domains of the proposed schemes, as it can deal with heterogeneous data ingestion of very different nature and scale. Spindox has previous experience in coordinating EU-funded projects (one in collaboration with USE, the Rexasi Pro project) and coordinating the proposal writing (from partners engagement, work package structure, etc.). We will present some funding opportunities in which TDA can be applied, potentially in combination with machine learning (ML) algorithms in different contexts with the aim of stimulating the scientific discussion and prepare for the next round of funding from 2024 to 2027.
The application of persistent homology to viral evolution was suggested some years ago at the example of genomic data of flu viruses. It was observed that long-lived topological features detect reassortment and recombination events, which provide much of the variability of flu viruses. Here, I instead present a careful investigation of short-lived features, typically interpreted as noise. Our results show that persistent homology efficiently identifies recurrent mutations in the evolution of SARS-CoV-2. This provides insights into how distinct viral strains exhibit convergent genomic structures, which is a hallmark of adaptation processes. Moreover, I will explain how our analysis can easily incorporate temporal information despite the general pitfalls of multi-persistent homology. In that way it is possible to monitor changes in convergent behaviour over the course of the pandemic.
In this talk we discuss various landmark based approaches to build topological features from digital images. Local binary patterns (LBP) is one of the successful approaches that showed promising results when used as a landmark based approach to build persistence barcodes from images ( face images and different image modalities). extended LBP approaches such as radial-LBP, Center-symmetric LBP and superpixels are also approaches which can be used to select landmarks from digital images to build topology. We discuss the stability of these landmark-based approaches when the input is perturbed by a small amount of noise. Finally, we compare landmarks-based approaches with Cubical complexes in terms of both performance and stability.
Cardiovascular diseases have been categorized as the most common cause of morbidity and mortality worldwide. Consequently, there's extensive research focused on their diagnosis. Among of all the possible tests to perform, the leading modality of non-invasive assessment of the cardiac function is the cardiovascular magnetic resonance (CRM) imaging. Such technique produces a great amount of data which makes it difficult for clinicians to analyze, but ideal for machine learning algorithms. In this talk, we will explain how we can introduce topological features into the cardiac disease diagnosis pipeline. Specifically, we will discuss the use of persistent homology descriptors that come from cubical complexes of images. We will present some preliminary results on small datasets and develop advanced ideas for larger and more complex datasets.
Clustering in high-dimensions poses many statistical challenges. While traditional distance-based clustering methods are computationally feasible, they lack probabilistic interpretation and rely on heuristics for estimation of the number of clusters. On the other hand, probabilistic model-based clustering techniques often fail to scale and devising algorithms that are able to effectively explore the posterior space is an open problem. In this talk I will describe a hybrid solution that entails defining a likelihood on pairwise dissimilarities between observations, using the problem of "die analysis" from digital numismatics as a case study.
Large consortia are delivering thousands of genomic sequences on a daily basis.
How to extract biological knowledge from these sequences is still a challenging problem. The function prediction issue, contrary to accepted notion, is far from have been resolved as the current methods do not progress regardless of methodology and data availability.
Here I would present how embracing concepts from the computer science field could help to address the large problem of function annotation. I would explain how different methods, from ML to protein Language Models, do perform in three different settings 1) model organisms, 2) non model organisms, and 3) massive annotation of millions of genes from strange organisms.
Data in nature comes in many different flavors, such as graphs (social networks), cell and simplicial complexes (3D meshes), manifolds (physical laws)... Topological Deep Learning (TDL) is an emerging subfield of Geometrical Deep Learning (GDL) that focuses on studying deep learning for data that can be naturally represented by topological domains, such as manifolds, hypergraphs, simplicial complexes, cell complexes, or, more recently, combinatorial complexes. Combinatorial complexes are a special type of graded hypergraphs that model most of the aforementioned topological domains. Combinatorial complexes, as introduced recently by Hajij et al., are ideal to define neural networks that work on data defined on top of them, using a generalistic approach for data combination and transformation, able to overcome several state-of-the-art methods in different tasks. In this talk, we will introduce Topological Deep Learning, Combinatorial Complexes, an attention-based TDL architecture for combinatorial complexes that was used to outperform other SOTA methods, and some potential applications of these neural networks to human clothing and biomedical problems that we will tackle in the new Spanish Topological Machine Learning consortium.
Topological data analysis (TDA), in combination with Machine Learning (ML) techniques, is being applied to various areas from computational biology to personalised medicine [1]. As depicted by the most recent work programmes, the European Commission allocated research funds for reducing the burden of non-communicable diseases (e.g., type-2 diabetes, metabolic syndrome, obesity, cardiovascular, chronic respiratory diseases, and mental disorders to name a few). Various destinations under the Cluster 1-Health aims at developing and providing stakeholders, citizens, and healthcare professional digital tools for predictive and preventative medicine approaches, which are often based on the so-called “patients stratification” methods. The combination of TDA-ML methods to decipher new rules of patients’ stratification starting from clinical data is novel and has several virtues. TDA, on one hand, provides explainable visualization of data and useful statistics that can be later used for ML classification methodologies. Besides, it is sufficiently versatile to be applied in data of very different nature such as time series analysis of gene expression data or graph representations of complex biological systems, among others, and provide insights if it has interesting topology. Nowadays, EU financial schemes are getting even more competitive and will require a multi-disciplinary approach (e.g., intervention from social sciences and humanities). We will provide some take-home messages from previous Cluster 1, health-related, project proposals both granted and rejected in other to improve the research proposal preparation and stimulate the scientific discussion on the TDA/ML involvement.
[1] Hensel et al., 2021. https://doi.org/10.3389/frai.2021.681108
We present different multi-agent systems (like swarms, multi-robot navigation, and traffic modeling) where group behaviors emerge from local rules. We discuss the relationship between complexity, coordination, and communication, focusing on humans interacting with multi-agent systems.
Social robots acting among and/or with humans require social awareness when performing their tasks. Motion is a key interaction modality, and when a robot is moving among persons, aspects like social constraints, legibility, human affordances and others have to be taken into account. In the talk, different techniques developed by our group for robot social navigation will be presented. These include techniques for human-aware motion planning, learning methods to obtain people navigation intention models and social cost functions from observations/demonstrations, and simulation tools and metrics for evaluating social navigation. The talk will discuss the results of using these different techniques in service robots in different applications.
Deep generative models have revolutionized the field of machine learning, making it practical to sample complex data distributions in exceptionally high-dimensional spaces, enabling the generation of plausible synthetic data points in scenarios where it was previously deemed impossible. Among these models, Diffusion Probabilistic Models have garnered significant attention. They combine the capabilities of Generative Adversarial Networks (GANs) for mapping noise vectors to realistic samples with the proficiency of Variational Autoencoders (VAEs) for explicitly modeling data distributions. However, in doing so, they sacrifice the data compression properties found in traditional autoencoders, as their latent space (comprising noise vectors) shares the same dimensionality as the observations. Additionally, these models learn to reconstruct samples by degrading them with independent Gaussian noise in each dimension. This approach overlooks the inherent multiscale nature of many phenomena these models aim to represent, such as images or time series. Hence, novel derivative models have recently emerged that employ alternative methods for signal degradation instead of relying on Gaussian noise, with the expectation of addressing these issues. Examples of such methods include Heat Dissipation and Blurring Models. Starting from the Manifold Hypothesis, which postulates that real-world high-dimensional datasets lie near a lower-dimensional manifold within the representation space, we investigate how samples from this data manifold (which cannot truly capture the properties of the manifold, due to the Curse of Dimensionality) are transformed under these models' encoders, from a topological point of view. Our primary objective is to determine optimal corruption thresholds that still allow reconstruction by the decoders, while degrading the signal in a way that can be used to build lower dimensional features from the samples, thus compressing the information.
The training of Deep Learning models requires to collect massive amounts of real-world labelled data. The use of these large amounts of data may bring some problems, such as prolonged training periods, significant energy consumption and substantial carbon emission.
To reduce this high consumption of resources, many researchers are looking for ways to reduce the size of these datasets, by simplifying the feature space or by removing redundant or noisy items.
In this talk we will introduce the Dominant Datasets algorithm, which preserves topological concepts such as persistent homology, to compress the training datasets while preserving their intrinsic properties, in such a way that the model training requires less computations but achieves similar results.
As part of the European Project REXASI-PRO, our aim is to detect phase transitions of an autonomous wheelchair fleet by using Topological Data Analysis (TDA). In particular, we want to use this information to detect potential collisions as well as gain insight into optimal route planning. Persistent Homology (PH) is a central TDA tool which encodes geometric information of data by means of a barcode summary. We would like to use this summary on our particular problem by comparing the PH results at different timesteps. For such comparisons, we plan to use the induced partial matchigns and block funtions which I will describe in this talk. Such constructions rely on the geometry, rather than the combinatorics of PH, which make these very powerful to gain insight into the underlying “persistence morphisms”. Also, I will show some computational results and discuss ways in which we plan to handle our problem.