On the last day of the Summer School, young researchers from the SAILab (Siena Artificial Intelligence Lab) will hold a series of seminars on the latest research ongoing in Siena, with a special focus on Biomedical Applications of AI.
SIMONE BONECHI
A weakly supervised approach to skin lesion segmentation
Recognizing skin cancer in time could greatly increase patients’ chances of recovery. For this reason, in recent years, numerous decision support systems have been proposed to help dermatologists in this diagnosis. These systems are generally based on Convolutional Neural Networks and are used for both segmentation and classification of lesions. Although their main goal is to correctly recognize the lesions’ type, the preliminary segmentation step has been shown to increase the performance of the classifier. In fact, this is not surprising because physicians also use information on the shape of the lesion to make a diagnosis. Thanks to the ISIC archive, a huge number of skin lesion images, along with the corresponding metadata (type, position, dimension, etc.), are publicly available to train a deep neural network, but, unfortunately, only a small fraction of them is labeled for segmentation. To overcome this limitation a weak supervised approach is proposed to extract the segmentation label maps from the entire ISIC archive. Moreover, to demonstrate the quality of the proposed approach, the generated supervisions were first compared with those available in ISIC, and then, used to train a segmentation network, whose performance was evaluated against that obtained using only the small set of ISIC label maps.
PAOLO ANDREINI
Towards a Comprehensive Characterization of Arteries and Veins in Retinal Imaging
Retinal fundus imaging is crucial for diagnosing and monitoring eye diseases, which are often linked to systemic health conditions such as diabetes and hypertension. Current deep learning techniques often narrowly focus on segmenting retinal blood vessels, lacking a more comprehensive analysis and characterization of the retinal vascular system. This study fills this gap by proposing a novel, integrated approach that leverages multiple stages to accurately determine vessel paths and extract informative features from them. The proposed approach enables the extraction of critical features at the individual vessel level, such as vessel tortuosity and diameter. This work lays the foundation for a comprehensive retinal image evaluation, going beyond isolated tasks like vessel segmentation, with significant potential for clinical diagnosis.
FILIPPO COSTANTI
A Deep Learning Approach to Analyze NMR Spectra of SH-SY5Y Cells for Alzheimer's Disease Diagnosis
The SH-SY5Y neuroblastoma cell line is often used as an in vitro model of neuronal function and is widely applied to study the molecular events leading to Alzheimer’s Disease (AD). Indeed, recently, basic research on SH-SY5Y cells has provided interesting insights for the discovery of new drugs and biomarkers for improved AD treatment and diagnosis. At the same time, untargeted NMR metabolomics is widely applied for metabolic profile analysis and screening for differential metabolites, to discover new biomarkers. In this paper, a compression technique based on convolutional autoencoders is proposed, which can perform a high dimensionality reduction of the spectral signal (up to more than 300 times), maintaining informative features (guaranteed by a reconstruction error always smaller than 5%). Moreover, before compression, an ad hoc preprocessing method was devised to remedy the scarcity of available data. The compressed spectral data were then used to train some SVM classifiers to distinguish diseased from healthy cells, achieving an accuracy close to 78%, a significantly better performance with respect to using standard PCA-compressed data.
NICCOLÒ PANCINO
GNN for the Prediction of Protein-Protein Interfaces
Binding site identification allows to determine the functionality and the quaternary structure of protein–protein complexes. Various approaches to this problem have been proposed without reaching a viable solution. Representing the interacting peptides as graphs, a correspondence graph describing their interaction can be built. Finding the maximum clique in the correspondence graph allows to identify the secondary structure elements belonging to the interaction site. Although the maximum clique problem is NP-complete, Graph Neural Networks make for an approximation tool that can solve the problem in an affordable time. Our experimental results are promising and suggest that this direction should be explored further.
FIAMMA ROMAGNOLI
Protein–Protein Interfaces: A Graph Neural Network Approach
Protein–protein interactions (PPIs) are fundamental processes governing cellular functions, crucial for understanding biological systems at the molecular level. Compared to experimental methods for PPI prediction and site identification, computational deep learning approaches represent an affordable and efficient solution to tackle these problems. Since protein structure can be summarized as a graph, graph neural networks (GNNs) represent the ideal deep learning architecture for the task. In this work, PPI prediction is modeled as a node-focused binary classification task using a GNN to determine whether a generic residue is part of the interface. Biological data were obtained from the Protein Data Bank in Europe (PDBe), leveraging the Protein Interfaces, Surfaces, and Assemblies (PISA) service. To gain a deeper understanding of how proteins interact, the data obtained from PISA were assembled into three datasets: Whole, Interface, and Chain, consisting of data on the whole protein, couples of interacting chains, and single chains, respectively. These three datasets correspond to three different nuances of the problem: identifying interfaces between protein complexes, between chains of the same protein, and interface regions in general. The results indicate that GNNs are capable of solving each of the three tasks with very good performance levels.
ELIA GIUSEPPE CERONI
HAGRID – Hybrid Autoencoder with a Generative, Recurrent and Iterative Design
The creation of novel molecules with desirable properties via computational methods presents a significant challenge due to the vast chemical diversity and numerous combinatorial possibilities. Few current state-of-the-art works leverage a graph-based representation of chemicals, instead relying on textual representations of compounds, such as SMILES. The aim of this work is to develop a fully self-supervised Graph Neural Network (GNN)-based Variational AutoEncoder capable of creating new Chemical compounds. The encoder part of the network is built using a GNN, which creates an embedding of the input molecular graph. The decoder part of the network is build with an ensemble of GNNs working iteratively to build the output graph: First, a GNN solving a node-focused problem reads the current graph topological structure and the graph embedding to generate a new node and its relative label. Then, another node-focused GNN creates the connections between the new node and the previously generated ones. Finally, an edge-focused GNN assigns the appropriate labels to the newly created connections. The method is still under active development, current research is focused on the development of an appropriate loss function of the model.
PIETRO BONGINI
Graph Neural Networks for Molecular Data
Graphs are an ubiquitous form of structured data. Processing them is fundamental to find efficient solutions for a wide variety of problems, especially in the biomedical field. Graph Neural Networks can be used for predicting drug side-effects in silico, analyzing metabolic networks, protein structures, knowledge graphs, and generating molecular graphs, just to name a few applications. In this seminar we will delve into the development of a pipeline for the generation and automatic evaluation of a pool of high-quality novel compounds from which it would be possible to later select candidate molecules for drug discovery research.