Learning and Predicting Transcriptome Architecture

Abstract:
Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-molecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNA-SH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architecture—the relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNA-SH handles extremely long sequence inputs at base-pair resolution (~65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and the accompanying frontier model will accelerate many aspects of RNA biotechnology. More broadly, we envision the use of LoRNA-SH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing.


Bio:
Dr. Hani Goodarzi is an Arc Core Investigator and Associate Professor at UCSF. His research combines novel discovery platforms and frontier AI models to reveal molecular mechanisms of cancer progression. Dr. Goodarzi has made key scientific contributions at the intersection of AI, RNA biology, and cancer research, and in both diagnostics and therapeutics. He has also co-founded several companies in this field, including Exai Bio, a next-generation liquid biopsy company, and Vevo Tx, a platform drug discovery leveraging AI to design better drugs. His work has been recognized with prestigious awards, including the Vilcek Prize for Creative Promise and the AACR-MPM Transformative Cancer Research Award. Dr. Goodarzi was previously honored with the Martin and Rose Wachtel Award in Cancer Research and named an American Cancer Society scholar.

Summary: