Abstract
Human diseases are driven by complex, dynamic changes in cellular states. While single-cell transcriptomics enables high-resolution profiling, a critical gap remains in computational tools capable of effectively modeling disease cells, progression trajectories, and enabling in silico drug discovery. To address this, we developed novel deep generative AI methods, built and learned from temporal and spatial single-cell multi-omics data, to construct "virtual" cell models and simulate disease dynamics.
We applied this framework to diverse complex diseases—including idiopathic pulmonary fibrosis (IPF), COVID-19, and multiple cancers. Our approach not only reconstructed disease dynamics with high fidelity but also facilitated virtual drug screening, identifying candidate therapeutic compounds that were experimentally validated. This demonstrates the framework's power to elucidate cellular mechanisms underlying disease progression, prioritize therapeutic interventions, and its broad applicability across distinct diseases. In this talk, I will present the design principles of these generative models, showcase their application to IPF and cancer datasets, and discuss how they empower in silico prediction and prioritization of therapeutic candidates.
Biography
Dr. Jun Ding is a Tenure-track Assistant Professor at McGill University, an affiliated member of RI-MUHC and Mila – Quebec AI Institute, and a Junior 2 FRQS Scholar in AI in health. His research focuses on developing deep generative neural networks to decode cellular dynamics from single-cell omics data, bridging AI and life sciences to uncover disease mechanisms and therapeutic strategies. Dr. Ding has published in leading journals, including Nature Biomedical Engineering, Nature Communications, Genome Research, Cell Stem Cell, and Genome Biology. His work, supported by CIHR and NSERC grants, advances AI-driven solutions for diagnostics and therapeutics in complex diseases.
Summary:
Focus: deep generative models of diseases and their dynamics
Can we simulate diseases?
Can we represent cells in-silico?
Multi-omic variation: Genome, Epigenome, Proteome, Transcroptome, Metabolome
Models:
Early: PCA
Current: Autoencoders, Foundation Models
Next frontier: leverage new under-explored data sources
SOLPHIN
Exons: functional regions of a given gene
Transcribed to portions of proteins
A single gene can decode to multiple proteins depending on which exons are decoded
DNA->RNA: all exons
RNA->mRNA: different exons are spliced into specific mRNAs, which then are decoded into proteins
We read mRNAs and can read the chosen exons and junctions between them
Deep Generative model: encode gene and exon data into embedding
Aggregation of of Junction Reads
Downstream analysis
Cell embedding
Exon-level marker
Alternative splicing
Using exon data makes it possible to detect pancreatic cancer markers missed by gene-count methods
MATES: quantifies locus-specific transposable elements in single-cell data
Multi-omic cell representation learning
Integrated modalities: scRNA-seq, snmC-seq, scATAC-seq
Encoded into a latent space and combined
Enables single-cell cross-modal generation
Given some modalities, generate others
Single-cell genomics is very expensive
$1.5m for 100 samples
Vs Bulk sequencing: $18k for 100 samples
Can we generate single-cell from bulk?
Approach: cross-modal generative model
How to represent disease progression in-silico?
Time series with sparse snapshot make it hard to understand evolution
Need fine temporal resolution
Trying to represent changes in gene expression over time under different conditions
Given virtual disease model, evaluate impacts of virtual drugs to find ways to bring diseased cells to healthy state
Prior methods based on public data, which is limited and require supervised labeling
Want a disease-specific unsupervised model
UNAGI model: https://github.com/mcgilldinglab/UNAGI
Data:
10 healthy donors
9 diseased
231,544 cells
Virtual cell: deep generative model learns cell embedding
Virtual disease: dynamics graph of disease progression in embedding space
Identifies the genes that drive disease progression
Impact of virtual drugs on cells
Model the impact of drugs on changes in gene expression
Apply these changes to the disease progression model
Model based single-cell data from patients with IPF(Idiopathic pulmonary fibrosis)
Model validated using experimental perturbations
Applied model to predict which drugs are most likely to treat disease
Identified several drugs that are likely to be effective
Tested one candidate (effective and cheap) by applying drug to diseased cells
The cells showed reduction of disease symptoms, which were close to what the model predicted