Computational Topology-Neural Networks Meeting

Sevilla, November 2-5, 2021

Organizer: Grupo FQM-369 CIMAgroup

The slides of the talks can be downloaded here:

Conference room

Salon de actos

Instituto de Matemáticas de la Universidad de Sevilla (IMUS)

Edificio Celestino Mutis, Campus de Reina Mercedes.

Avda. Reina Mercedes, s/n. 41012 - Sevilla

Coffee break & Lunch

Cafetería

Escuela Técnica Superior de Ingeniería Informática,

Universidad de Sevilla, 41012 Sevilla


Please, notice that it is a residence and not a hotel. The rooms are provided with towels and bed sheets. The room is cleaned just once a week. There are several nice restaurants and bars near the meeting place in Reina Mercedes Avenue and near the residence (around Real Betis Stadium).

Accomodation

Residencia Universitaria Rector Estanislao del Campo

Avda. Ctra. de su Eminencia, 2A

41013 Sevilla (España)

https://www.rusevilla.com


Tapas

Monday

Cervecería Huracán

Av. de Alemania, 6a, 41012

Wednesday

La barca de Calderón

Paseo de Ntra. Sra. de la O, 41010 Sevilla

Paperwork

1. A copy of your passport (non-EU residents)

2. A Copy of your identity card (all)

3. A proof of your bank account (a screenshot is enough)

Talks

Tuesday

Rocio Gonzalez-Diaz 11:00

Universidad de Sevilla, Sevilla, Spain

Topological Data Analysis and Applications

In this talk we will remember the basic ingredients for topological data analysis and its most important characteristics that make it a robust, powerful and alternative tool to classic data analysis techniques. In addition, we will expose applications of a very diverse nature in which the versatility of this tool will be evident.

Specifically, we will talk about persistent entropy, which is nothing more than the entropy of the barcode of a filtration, and we will discuss its most interesting properties. Some of the applications that we will see are:

- Classification of emotions in videos of people talking to the camera.

- Identification of writing patterns of an author to certify the authorship of a writing.



José Enrique Chacón 12:00

Universidad de Extremadura, Badajoz, Spain

Understanding modal clustering through Differential Topology: population goal and asymptotics

Despite its popularity, the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, for some of the clustering methodologies, it is difficult to specify the population goal to which the data-based clustering algorithms try to get close. In this talk we investigate the theoretical foundations of clustering by focusing on two main objectives: first, to provide an explicit formulation for the ideal population goal of modal clustering, which understands clusters as regions of high density (here, Morse theory plays a crucial role); and second, to analyze the large-sample properties of data-based methods aimed to approximate such a population goal.


Matteo Rucco 15:00

Spindox, Trento, Italy

REXASIPRO project

The REXASI-PRO project aims to release a novel engineering framework to develop greener and Trustworthy Artificial Intelligence solutions.

The project will develop in parallel the design of novel trustworthy-by-construction AI solutions for social navigations and a methodology to certify the robustness of AI-based autonomous vehicles for people with reduced mobility.


Eduardo Paluzo 16:00

Universidad de Sevilla, Sevilla, Spain

Representative datasets and simplicial-map neural networks

We will provide insight into different applications of computational topology tools in different aspects of machine learning. Specifically, we will define a new concept towards dataset size reduction keeping the main information of the dataset. This concept is called representative dataset and is defined using a similarity notion between datasets induced by the Gromov-Hausdorff distance. Besides, we will study simplicial-map neural networks which are two-hidden-layer neural networks suitable for regression and classification problems that are defined using simplicial maps between simplicial triangulations. These neural networks are universal approximators and their robustness against adversarial examples can be controlled. Finally, we will see different approaches to improve the efficiency of the simplicial-map neural networks.

Wednesday


Matteo Rucco 9:00

Spindox, Trento, Italy

Risk prediction of clinical adverse outcomes with machine learning in a cohort of critically ill patients with atrial fibrillation

Critically ill patients affected by atrial fibrillation are at high risk of adverse events: however, the actual risk stratification models for haemorrhagic and thrombotic events are not validated in a critical care setting. With this paper we aimed to identify, adopting topological data analysis, the risk factors for therapeutic failure (in-hospital death or intensive care unit transfer), the in-hospital occurrence of stroke/TIA and major bleeding in a cohort of critically ill patients with pre-existing atrial fibrillation admitted to a stepdown unit; to engineer newer prediction models based on machine learning in the same cohort. We selected all medical patients admitted for critical illness and a history of pre-existing atrial fibrillation in the timeframe 01/01/2002–03/08/2007. All data regarding patients’ medical history, comorbidities, drugs adopted, vital parameters and outcomes (therapeutic failure, stroke/TIA and major bleeding) were acquired from electronic medical records. Risk factors for each outcome were analyzed adopting topological data analysis. Machine learning was used to generate three different predictive models. We were able to identify specific risk factors and to engineer dedicated clinical prediction models for therapeutic failure (AUC: 0.974, 95%CI: 0.934–0.975), stroke/TIA (AUC: 0.931, 95%CI: 0.896–0.940; Brier score: 0.13) and major bleeding (AUC: 0.930:0.911–0.939; Brier score: 0.09) in critically-ill patients, which were able to predict accurately their respective clinical outcomes. Topological data analysis and machine learning techniques represent a concrete viewpoint for the physician to predict the risk at the patients’ level, aiding the selection of the best therapeutic strategy in critically ill patients affected by pre-existing atrial fibrillation.


Sara Narteni 11:00

CNR, Rome, Italy

Opportunities of eXplainable AI for the evaluation of Generative Adversarial Networks

Data augmentation is a widespread innovative technique in Artificial Intelligence: it aims at creating new synthetic data based on an existing baseline, thus allowing to overcome some issues related to the lack of data for training learning algorithms. The talk will be focused on how a data augmentation methodology, the Generative Adversarial Networks (GANs), which are commonly adopted for images and time-series data, can be also applied to generate artificial tabular data. In particular, the presented application will deal with the augmentation of IMU devices data for physical fatigue monitoring and an innovative methodology to evaluate the quality of the synthetic data generated by GANs will be presented, being based on explainable and reliable AI. A rule similarity measure drives the selection of the best fake dataset among a set of different trials, corresponding to different GANs training parameters combinations.


Miguel Cárdenas 12:00

CIEMAT, Madrid, Spain

Explainable Artificial Intelligence for Sustainable Development Goals

The complexity and volume of data make it necessary to address new strategies for the exploitation of the information embedded. In this scenario, Artificial Intelligence (IA) has emerged as a set of techniques mandatory to maintain scientific and industrial production at rates like the current ones. Today, many decisions are supported by IA-based algorithms, many of them affecting the Sustainable Development Goals (SDG). For this reason Explainable Artificial Intelligence (XAI) for explaining the decision-making process has become critical for a deep understanding of the facts supporting the decisión.


Aras Assad 15:00

Oxford Drug Design, UK

Problems of Deploying AI models in practise: Covid-19 detection and prognostication from X-ray and CT-scans as a case study.

Since the start of COVID-19 pandemic in the beginning of 2020, thousands of machine learning papers have been published worldwide to help clinicians in fast and accurate COVID19 detection and prognostication using different medical image modalities such as X-ray and CT-scans. In this talk, we demonstrate some key issues in deploying such AI models in practise. Specifically, we tested 12 well-known deep convolutional neural network (CNN) architectures on 3 publicly available chest X-ray databases and demonstrated that these models may 'cheat' by using features that have nothing to do with COVID-19 signs present in chest X-ray images. Furthermore, CNN models can use hidden biases in data to conclude its final output predictions such as texts, medical devices and noise in X-rays or information outside region-of-interest (ROI) from CT-scans and X-rays. Quantitatively, CNN models can achieve very high-classification accuracy but we qualitatively demonstrate that CNN decisions should not be taken into consideration, despite their high classification accuracy, until clinicians can visually inspect and approve the region(s) of the input image used by CNNs that lead to its prediction.

This work is a collaboration with Taban Majeed, Rasber Rashid and Dashti Ali.


Jerome Guzzi (remote) 16:00

SUPSI-IDSIA, Switzerland

Navigation algorithms for swarm of robots: implicit and explicit coordination

Simple navigation rules for the individual agents may give rise to coordinated motions in the group. First, I introduce the context, highlighting the role of navigation in a single robot control hierarchy and the multi-robot aspects. Then, I present some navigation algorithms and focus on rules that describe how pedestrians move in a crowd. Although this algorithm is fully distributed and does not explicitly coordinate different agents, it gives rise to very efficient group patterns, like the spontaneous formation of lanes of opposing flow that also human crowds manifest. I show that when agents, like robots, have partial knowledge of the environments, coordination is also dependent on how much information they can (explicitly or implicitly) exchange. I discuss how group behavior can be partially engineered by tuning the individuals' navigation algorithm. Finally, I present ongoing investigations on the relationship between explicit communication, behavior modulation, and imitation learning.


Thursday


Marco Rocchetto 9:00

V-Research, Verona, Italy

The Etiology of Cybersecurity

In this talk, I will present a mathematical interpretation of systems (software or cyber-physical systems) which allows the investigation of an abstract law behind the security of systems in a mereo-topological structure (aligned with the Region Connection Calculus). Such law is intended to explain security by predicting insecurities in abstractions (or models) of systems, requiring a still missing (but postulated) realization of a cybersecure system. The cybersecurity of a system is often defined as the adherence to some cybersecurity requirements such as confidentiality or authenticity. However, the very protocols or algorithms introduced in a system to guarantee cybersecurity properties can themselves be flawed, making the whole system insecure. I conjecture that this is an indication that cybersecurity should not be searched in such properties, but rather in an “idea” or hypothesis explaining why security flaws occur independently from the properties supposedly guaranteed by security requirements, algorithms, protocols, or (sub-)systems. This makes cybersecurity not an object to be defined but a concept to be understood.

Observations show that systems are insecure because systems’ behaviors are not fully foreseen (e.g., at design time). These flaws are errors, which are not generated by a malicious attacker but intrinsic to the intensional structure (or architecture) of a system. An attacker (or a pentester) exposes such errors, but errors are necessarily introduced by ignorance (or wrongdoing) during the engineering process of a system. The formal interpretation of the design of a system proposed in this talk shows how to calculate all possible divergences from the intended behavior at a very high level of abstraction (and universality). I'm refining and testing this hypothesis at lower abstractions, closer to the implementation code, to understand if our ideas can be used to generate attacks in the logic of protocols and algorithms and I’ll present my preliminary results.


Nicolas Boutry 11:00

EPITA Research and Development Laboratory (LRDE), Le Kremlin-Bicetre, France

Could we learn from deep learning?

Deep Learning models are undeniably a powerful way to solve problems in a diversity of areas achieving impressive results. Nevertheless, these results generally come with the price of increasing opacity during the decision making process, which could be a problem for high risk applications such as medicine and transportation. Helping to understand this opaque deep learning models the area of Explainable Artificial Intelligence (xAI) has been increasing and could help to look inside the ``black boxes'' and tell us what is going on. In this work we want to understand if Convolutional Neural Networks (CNNs) are able to establish correlations with geometric properties. Starting from a simple classification task for deciding if an image has a disk or any other shape, we want to verify if, after the training process, the network is able to encode some geometric property to distinguish between disks and the rest. In order to verify this, we try to find a constant behavior through the network internal representations (activations). To discover this behavior, we perform a polynomial regression for describing the samples representation in function of a geometric parameter (the radius of the disk). Subsequently, we search for a transformation (T) which relates the polynomials (and consequently, the classes), generating a dependency system. The intuition is, if we analyse this matrix T by obtaining its eigenvalues and eigenvectors (which describe invariant directions of the system), we will be able to find some geometric constant directly related to the network learned information.


Jehan Ghafuri 12:00

Univ of Buckingham, Buckingham, UK

Impact of algebraic proerties of CNN Convolutional filters on PH Landmarks-based classification of convolved images

This work is concerned with the impact of algebraic properties of pretrained CNN convolutional layer filters, initialised by random Gaussian/Uniform distributions, on class discriminability of the persistent homology feature maps extracted from texture landmarks of the convolved images. Our ultimate aim is to identify and reduce potential sources of overfitting in CNN models. We follow a 2-pronges investigation: (1) compute the persistent homology (PH) features of point cloud of image texture landmarks (mainly Uniform local binary patterns (ULBPs)) in the output feature maps post different convolutional layers of CNNs, activation function, normalization, and down sampling; (2) determine the stability and sensitivity of filters’ condition numbers during and post model training over several epochs. The experimental results were conducted using several CNN models including the feedforward AlexNet and VGG16 models for classification tasks in different image modalities. In particular, we deal with detection of face image morphing attacks as well as distinguishing benign masses from malignant ones in ultrasound scan images of human bladder and liver organs. We shall demonstrate that the condition number of the pretrained filters influences the discriminatory power of PH features of ULBP groups landmarks post convolution. We shall also discuss ways of exploiting these properties for designing filter pruning/selection/replacement that maintain, if not enhance, classification accuracy while improving efficiency.


Vanessa Orani 15:00

CNR, Rome, Italy

Video analytics over 5G

Integrated in-vehicle connectivity enables new services and new user experiences through the introduction of new ways of shared use as opposed to traditional ownership. Moreover, driving support technologies are changes to be expected soon. In the Drive Alert use case of the Genova 5G project, a machine learning model will be created and trained to infer a risk level on the street, such as stopped vehicles or vulnerable road users (pedestrians, cyclists, etc.), and send an emergency signal accordingly over the network (from the 5G edge device to the vehicle). A key factor is the availability of ultra-fast and reliable connections enabled by mobile 5G that also provides high connection speeds and very low data transfer delays for real-time applications. The driver of a connected public vehicle will know in advance the potential risks on his/her route that are not visible from his/her angle of vision. Such as a use case is the first one in Italy including network slicing at TRL7.


Maurizio Mongelli (remote) 16:00

CNR, Rome, Italy

Standardization activities on Trustworthy AI

Trustworthy AI is one of the open challenges of science and engineering for the near future. Some approaches to the problem deal with eXplainable AI (XAI) and Reliable AI (RAI). XAI allows to understand the reasoning carried out by the AI model to take decisions. RAI describes under what circumstances the system can be considered safe and secure, minimizing the statistical error inherent in machine learning models. The talk introduces XAI and RAI studies @ CNR-IEIIT and gives some insights into the standardization evolution of Trustworthy AI, with reference to the position taken by the European Avionics Safety Agency (EASA).


Friday


Javier Lamar 9:00

IAPR, Universidad de Évora, Évora, Portugal

Simplicial complex built from the weights of the layers of an already trained trasformer network (Bert)

In this work, we present the advances in the use of algebraic topology to monitor the training process in complex networks like Bert with 101M or 340M of parameters. For some years the deep learning has been at the top of the state of the art in many areas. However, in general, practical experience and many training data can be the key to reach the waited results. To make tunning on different parameters can be a common practice during the process of the training. The design of the network, the initialization of the weights, optimization algorithm, batch-size, regularization methods, etc are some tunning parameters. These parameters are aimed mainly to avoid overfitting, i.e, to obtain high accuracy on the evaluation dataset. On the other hand, the trend to improve the results using deep learning is based mainly on designing networks every time larger and using more training data. In this sense, the algebraic topology can be the right tool to handle large numbers of data in high dimensions. Using the topology we could obtain information about the training process or get the initialization of some of the parameters to reach better and faster results. The idea is to find topological patterns in the topological space (simplicial complex) obtained from a points cloud formed by the weights during the training process. A simplicial complex can be obtained for each N optimization step. The Betti numbers obtained using the incremental algorithm (persistent homology) are used to compare the simplicial complexes. In the experiments, we use the weights of the layers of attention of the network Bert. We present the first advances which show that there exist topological patterns in the weights (trainable parameters) of this network trained.


Alessia Paglialonga 11:00

CNR, Milan, Italy

Data-driven approaches for disease prediction and prevention

This talk will provide an outline of the data analytics methods @ CNR-IEIIT to extract actionable knowledge from health-related data towards the development of novel digital systems to monitor, predict, and improve human health and wellbeing. Examples of recent and ongoing projects include: unsupervised machine learning for patient segmentation and tailored diabetes intervention; AI-based screening and prediction of hearing impairment and cognitive decline; and analysis of longitudinal health data for chronic disease prediction.


Marta Lenatti 11:30

CNR, Milan, Italy

A framework of eXplainable AI to assess data from adult hearing screening

Despite its prevalence, hearing loss is often overlooked and perceived as a natural consequence of aging, especially by older adults who are reluctant to test their hearing abilities. However, individuals may not fully trust the recommendations provided by a new hearing screening tool if its decision-making process is based on 'black box' models, thus lacking transparency. Integrating a newly developed speech-in-noise test with eXplainable artificial intelligence (XAI) models, delivered with a set of intelligible rules and numerical cut-offs, can provide reliable advices on the subject's hearing condition, eventually promoting early intervention. In addition, to address data scarcity, data collected during screening campaigns have been augmented by means of generative adversarial networks (GANs). To ensure that the synthetic data fully represented the actual observed phenomenon, the quality of the new augmented dataset was assessed by evaluating the XAI model built on these data in terms of classification performance and generated rules.


Rocio Gonzalez Diaz 12:00

Universidad de Sevilla, Sevilla, Spain


Conclusions and future work

We end the workshop with a talk devoted to conclusions and open problems to tackle in the near future.

Fundings:

- Agencia Estatal de Investigación/10.13039/501100011033 under grant

PID2019-107339GB-100

- Agencia Andaluza del Conocimiento under grant PY20-01145

- Plan propio de investigación de la Universidad de Sevilla

- Departamento de Matemática Aplicada I de la Universidad de Sevilla

- EIDUS: actividad de doctorado

- Ayuda a grupos PAIDI FQM-369