Anaïs Baudot
Multimodal data integration for rare genetic diseases
Recent technological advances and the growing availability of biomedical datasets offer unprecedented opportunities to better understand human diseases. However, translating the sheer volume and heterogeneity of these data into meaningful insights requires proper computational strategies. In this talk, I will present different approaches for the integrated exploration of heterogeneous datasets, from walking on multilayer networks to knowledge graph embedding to joint dimensionality reduction. I will illustrate the application of these algorithms in the context of rare genetic diseases analysis, which raise various challenges: many patients remain undiagnosed, phenotypes can be highly heterogeneous, and few treatments exist.
Agathe Bugnon
Description of molecular interaction networks involved in chronic rejection in kidney transplantation
Limiting kidney allograft loss is a major challenge in kidney transplantation, especially due to organ scarcity and the growing number of patients on the waiting list. Chronic Antibody-Mediated Rejection (ABMR) represents the main cause of late transplant failure, yet the underlying pathophysiological mechanisms remain poorly understood beyond the role of HLA donor-recipient compatibility. We propose to tackle this issue using a multi-omics approach to describe molecular interaction networks implicated in ABMR following kidney transplantation. Specifically, we aim to decipher genomic–transcriptomic interaction networks, also called expression Quantitative Trait Loci (eQTLs), which define the impact of Single Nucleotide Polymorphism (SNP) genotypes on gene expression levels. Then, we aim to use these eQTLs to perform Transcriptome-Wide Association Studies (TWAS), estimating gene expression levels from genomic data and identifying molecular biomarkers associated with the ABMR phenotype in renal transplantation.
Genomic data (~8 million SNPs) were gathered by genotyping over 2,000 kidney transplant donor-recipient pairs from the KiT-GENIE biobank. Transcriptomic data were collected via total RNA sequencing (~14,000 expressed genes) from a KiT-GENIE subset encompassing 167 samples of peripheral blood mononuclear cells, drawn at the time of biopsy-based diagnosis. We employed the TensorQTL tool to identify direct proximal (cis-eQTL) and indirect distal (trans-eQTL) SNP-gene associations within recipients, and to study the impact of the donor genome on the recipient transcriptome (cross-eQTL). Next, we used the PrediXcan tool to carry out TWAS in order to identify genes associated with ABMR in kidney transplantation.
After quality controls, we successfully cross-referenced genomic and transcriptomic data for 146 samples, sourced from 85 recipients. We performed 62,956,315 proximal associations, revealing 147,825 significant cis-eQTLs, and 9,168,977 distal associations, including 396,644 significant trans-eQTLs. Integration of donor genomic data with transcriptomic data from 124 recipient samples revealed 7,471,370 associations, including 387,316 significant cross-eQTLs. The cis-eQTL panel was then used to estimate genetically regulated gene expression levels for the remaining 1,626 European recipients from the KiT-GENIE cohort, which enabled us to study the involvement of 6,423 genes in the ABMR phenotype in renal transplantation. TWAS based on the other eQTL panels are currently ongoing.
CONCLUSION: We have established the first eQTL repository for kidney transplanted patients. Using this database, we applied TWAS on the rest of the KiT-GENIE cohort in order to decipher genes implicated in ABMR following kidney transplantation. This analytical framework is designed to be extended to other types of molecular interaction networks, such as genomic-methylomic interaction networks. Overall, these results could enable the identification of blood biomarkers of renal allograft ABMR, thereby supporting predictive research and the discovery of potential therapeutic targets to enhance the prevention and management of this complication.
Giulia Calia
HIVE: a novel unsupervised ML approach for integrative analysis of single-stress experiments studying plant defence response against multiple stresses
Plants, as sessile organisms, must endure multiple biotic and abiotic stresses that can simultaneously occur. Omics experiments replicating these conditions are difficult to perform, therefore, usual experimental designs focus on plant-single stress exposure at a time, hampering the global understanding of the ongoing molecular changes in real case scenarios. Current methods to integrate unpaired omics data lead to a prioritization of specific response signatures over the common ones. For this purpose, we developed HIVE (Horizontal Integration analysis using Variational AutoEncoders), a novel computational tool that first applies a variational autoencoder on the integrated unpaired transcriptomics dataset to alleviate the batch effect, then couples a random forest regression and the SHAP explainer to select the most contributing genes to the patterns captured in the latent space. Those genes represent the ones modulated in the response to one or multiple stresses. We demonstrate HIVE functionalities by integrating either microarray or RNA-sequencing data coming from single experiments on five important crops and the model organism Arabidopsis thaliana. HIVE associates selected genes with specific or multiple stress conditions (biotic and/or abiotic), enabling the discovery of shared defense response signatures from in silico integration of datasets, outperforming existing methods in studying complex multi-factorial stress responses in plants.
Chiara Damiani
Lactate in, Lactate out: Can we infer metabolism from spatial transcriptomics?
Understanding how tumor cells rewire metabolism in space remains a major challenge. In this talk, I will present a computational framework that leverages spatial transcriptomics and constraint-based modeling to infer directional flux patterns across tissue regions. Applying this method to colorectal cancer and matched liver metastases, we uncover non-canonical patterns of lactate utilization that vary across tumor–stroma interfaces. I will discuss how spatially resolved modeling can generate testable hypotheses on metabolic behavior, and how integration of gene expression with metabolic networks can bridge omics data and functional insight.
Harold Duruflé
Closing the loop between prediction and integration in multi-omics
Complex phenotypic traits are influenced by the interaction of multiple genetic and environmental factors, often regulated by nonlinear interactions. Deciphering these interactions between different endophenotypic spaces, such as epigenomics and transcriptomics, remains a major challenge in biology. In this presentation, I will share the results of integrative and predictive analyses conducted on a large panel of black poplar individuals from natural populations, for which comprehensive multi-omics datasets were generated. By using prediction models integrating multi-omics data, we improve our ability to dissect the relationships between these types of data and clarify the contribution of each omics layer to trait prediction.
Sakshi Khaiwal
Predicting yeast quantitative traits using multi-omics data with machine learning
Most traits that define an individual are encoded by complex interactions between genetics and the environment, ranging from height to disease susceptibility and response to medication. Predicting these traits is therefore a major goal of modern medicine, as it could pave the way for preventive and personalized healthcare. However, this remains a major challenge due to the complexity of the human genome and the many environmental factors (diet, lifestyle, and social context) that vary between individuals. To attempt such a challenge, we used the baker’s yeast Saccharomyces cerevisiae as a model organism, for which both complete genetic information and hundreds of traits are available for 1011 worldwide isolates grown in laboratory-controlled environments.
To investigate phenotype predictions for 223 traits measured across the 1011 yeast collection, we developed a machine learning (ML) pipeline that integrates genomics (pangenome, SNPs, etc.), transcriptomics, and proteomics data. The pipeline evaluates both linear and non-linear models and benchmarks their performance across diverse input types and the full set of traits. Gradient boosting machines emerged as the best-performing model. Gene function disruption scores and gene presence/absence emerged as best predictors, suggesting a considerable contribution of the accessory genome in controlling phenotypes. The prediction accuracy broadly varied among phenotypes, with stress resistance being easier to predict compared to growth across nutrients. ML identified relevant genomic features linked to phenotypes, including high-impact variants with established relationships to phenotypes, despite these being rare in the population. Near-perfect accuracies were achieved when other phenomics data, mostly in similar conditions, were used, suggesting that useful information can be conveyed across phenotypes.
Our study presents the first large-scale comparison of machine learning methods across a broad range of traits and multi-omics data as predictors, highlighting their ability to decipher causative variants at the population level. We believe that our machine learning framework can be extended to other organisms, including humans, and will ultimately support the development of more accurate models for predicting human traits and disease risks.
Wassila Khatir
A multi-omic analysis for a comprehensive understanding of the patho-physiology of Fragile X Syndrome
Fragile X Syndrome (FXS) is a neurodevelopmental disorder caused by mutations in the FMR1 gene, leading to the loss of FMRP, an RNA-binding protein that regulates translation. The Fmr1 knockout (KO) mouse recapitulates this phenotype and has been widely used to access the perturbations in molecular homeostasis in the FXS brain, notably via omics approaches. These studies showed that the absence of FMRP disrupts coordination between transcriptomic and translatomic layers, impairing neurogenesis and resulting in intellectual disability. However, limited sample availability and inter-dataset heterogeneity hinder the detection of coordinated multi-omic disruptions. To address this, we trained a Multi-Channel Variational Autoencoder (MCVAE) exclusively on wild-type (WT) samples, learning a shared latent representation across transcriptomic and translatomic modalities via cross-modal reconstruction. Testing on Fmr1 KO samples identified anomalies through 20-fold cross-validation, revealing disruptions in both modalities, including Fmr1 itself. Translatomic anomalies aligned with curated FXS-related databases and showed regulatory associations with the transcriptomic anomalies, validated by ChIP-seq data. MCVAE outperformed alternative integration methods in biological coherence, demonstrated by enrichment of external CLIP-seq targets, providing a robust framework for uncovering coordinated molecular dysregulation in FXS.
Gabriel Krouk
Data integration for plant gene regulatory network modeling… and beyond
In this short presentation, I will share a few snapshots of our ongoing efforts in data integration. These range from modeling Gene Regulatory Networks (GRNs) to leveraging generative AI for the design of regulatory peptides, and perhaps even a drizzle of a new kind of GWAS
Daniele Raimondi
Non-linear data fusion through the factorization of Entity-Relation Graphs
Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress.
Several data fusion (or data integration) strategies have been proposed so far, but they all present some shortcomings. Here I will present NXTfusion, which is a Neural Network-based non-linear data fusion framework that generalizes and extends the conventional matrix factorization paradigm allowing inference over arbitrary Entity-Relation Graphs. I will discuss its application to protein-protein interaction prediction, drug-target interaction prediction and the dose-response prediction of anticancer drugs.
Lorenzo Sala
Hybrid data integration with PINNs: mechanistic modeling of biological systems using omics data
Understanding complex biological systems, like microbial communities or plant-pathogen interactions, demands effective ways to integrate omics data with mechanistic models. A key challenge is getting robust parameter estimates from these datasets. In this presentation, I'll share how we're tackling this using a hybrid data integration framework that leverages Physics-Informed Neural Networks (PINNs). We embed various Ordinary Differential Equation (ODE) models into these networks, allowing us to fuse mechanistic knowledge with observational omics data. I'll show how this approach enables more robust and interpretable parameter inferences, even when dealing with noisy and sparse datasets. This strategy holds significant promise for effectively scaling analyses with the increasing volume and complexity of diverse omics data, laying groundwork for future multi-omic integrations.