A draft program with the papers and abstracts accepted at RECOMB CCB is available!

RECOMB CCB

The 15th RECOMB Satellite Workshop on Computational Cancer Biology

April 14-15, 2023

Istanbul, Turkey

This the DRAFT program of RECOMB CCB 2023 (last update 04/01/2023)

The official RECOMB CCB program is available in the conference Whova app associated to the main RECOMB, and reported here as Google sheet for a fast consultation.

RECOMB CCB 2023 program

Keynote and invited talks

Title and abstracts are reported at the RECOMB CCB Homepage.

Accepted full papers

The following papers have been accepted for full presentation at RECOMB CCB 2023. For each paper, a link to the biorXiv preprint is made available prior to the conference.

Pathway-informed deep learning model for survival analysis and pathological classification of gliomas. Gun Kaynar (Bilkent University), Doruk Cakmakci (McGill University), Caroline Bund (Department of Nuclear Medicine and Molecular Imaging, ICANS), Julien Todeschi (University Hospital "Hautepierre"), Izzie Jacques Namer (Department of Nuclear Medicine and Molecular Imaging, ICANS) and A. Ercument Cicek (Bilkent University). https://www.biorxiv.org/content/10.1101/2022.10.21.513161v2

Abstract. Online assessment of tumor characteristics during surgery is important and has the potential to establish an intra-operative surgeon feedback mechanism. With the availability of such feedback, surgeons could decide to be more liberal or conservative regarding the resection of the tumor. While there are methods to perform metabolomics-based online tumor pathology prediction, their model complexity and, in turn, the predictive performance is limited by the small dataset sizes. Furthermore, the information conveyed by the feedback provided on the tumor tissue could be improved both in terms of content and accuracy. In this study, we propose a metabolic pathway-informed deep learning model, PiDeeL, to perform survival analysis and pathology assessment based on metabolite concentrations. We show that incorporating pathway information into the model architecture substantially reduces parameter complexity and achieves better survival analysis and pathological classification performance. With these design decisions, we show that PiDeeL improves tumor pathology prediction performance of the state-of-the-art in terms of the Area Under the ROC Curve (AUC-ROC) by 3.38% and the Area Under the Precision-Recall Curve (AUC-PR) by 4.06%. Similarly, with respect to the time-dependent concordance index (c-index), we observe that PiDeeL achieves better survival analysis performance (improvement up to 4.3%) when compared to the state-of-the-art. Moreover, we show that importance analyses performed on input metabolite features as well as pathway-specific hidden-layer neurons of PiDeel provide insights into tumor metabolism. We foresee that the use of this model in the surgery room will help surgeons adjust the surgery plan on the fly and will result in better prognosis estimates tailored to surgical procedures. The code is released at https://github.com/ciceklab/PiDeeL. The data used in this study is released at https://zenodo.org/record/7228791.

SHARE-Topic: Bayesian Interpretable Modelling of Single-Cell Multi-Omic Data. Nour El Kazwini (Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati) and Guido Sanguinetti (Theoretical and Scientific Data Science, Scuola Internazionale Superiore di Studi Avanzati). https://www.biorxiv.org/content/10.1101/2023.02.02.526696v1

Abstract. Single-cell sequencing technologies are providing unprecedented insights into the molecular biology of individual cells. More recently, multi-omic technologies have emerged which can simultaneously measure gene expression and the epigenomic state of the same cell, holding the promise to unlock our understanding of the epigenetic mechanisms of gene regulation. However, the sparsity and noisy nature of the data poses fundamental statistical challenges which hinder our ability to extract biological knowledge from these complex data sets. Here we propose SHARE-Topic, a Bayesian generative model of multi-omic single cell data which addresses these challenges from the point of view of topic models. SHARE-Topic identifies common patterns of co-variation between different ‘omic layers, providing interpretable explanations for the complexity of the data. Tested on joint ATAC and expression data, SHARE-Topic was able to provide low dimensional representations that recapitulate known biology, and to define in a principled way associations between genes and distal regulators in individual cells. We illustrate SHARE-Topic in a case study of B-cell lymphoma, studying the usage of alternative promoters in the regulation of the FOXP1 transcription factors.

Phertilizer: Growing a Clonal Tree from Ultra-low Coverage Single-cell DNA Sequencing of Tumors. Leah Weber (University of Illinois Urbana-Champaign), Chuanyi Zhang (University of Illinois Urbana-Champaign), Idoia Ochoa (University of Illinois at Urbana-Champaign) and Mohammed El-Kebir (University of Illinois at Urbana-Champaign). https://www.biorxiv.org/content/10.1101/2022.04.18.488655v3

Abstract. Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells.

We present Phertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor.

Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance of Phertilizer on simulated data as well as on two real datasets, finding that Phertilizer effectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.

Availability: https://github.com/elkebir-group/phertilizer

MOViDA: Multi-Omics Visible Drug Activity Prediction with a Biologically Informed Neural Network Model. Luigi Ferraro (Federico II, university of Naples), Giovanni Scala (Federico II, university of Naples), Luigi Cerulo (University of Sannio), Emanuele Carosati (University of Trieste) and Michele Ceccarelli (Federico II, university of Naples). https://www.biorxiv.org/content/10.1101/2023.04.07.535998v1

Abstract. Drug discovery is a challenging task, characterized by a protracted period of time between initial development and market release, with a high rate of attrition at each stage. Computational virtual screening, powered by machine learning algorithms, has emerged as a promising approach for predicting therapeutic efficacy. However, the complex relationships between fea tures learned by these algorithms can be challenging to decipher. We have devised a neural network model for the prediction of drug sensitivity, which employs a biologically-informed visible neural network (VNN), enabling a greater level of interpretability. The trained model can be scrutinized to investigate the biological pathways that play a fundamental role in prediction, as well as the chemical properties of drugs that influence sensitivity. The model leverages multi-omics data obtained from diverse tumor tissue sources and molecular descriptors that encode drug properties. We have extended the model to predict drug synergy, resulting in favorable outcomes while retaining interpretability. Given the often unbalanced nature of publicly available drug screening datasets, our model demonstrates superior performance compared to state-of-the-art visible machine learning algorithms.

Exploring tumor-normal cross-talk with TranNet: role of the environment in tumor progression. Bayarbaatar Amgalan (National Center of Biotechnology Information, NLM, NIH), Chi-Ping Day (Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH,) and Teresa Przytycka (National Center of Biotechnology Information, NLM, NIH). https://www.biorxiv.org/content/10.1101/2023.02.24.529899v2

Abstract. There is a growing understanding that normal tissue samples used as controls in cancer studies do not represent fully healthy tissue but are, instead, intermediate between healthy and cancer. Several factors can contribute to the deviation of solid tumor control samples from healthy state including exposure of the control sample to the same environmental and genetic tumor-promoting factors as the tumor, changes due to tumor related immune response and other aspects of tumor microenvironment. Characterizing the relation between gene expression in control samples and tumor progression is fundamental for understanding regulatory roles of tumor environment and for devising cancer prognostic markers and treatment. We developed and validated TranNet, a computational approach to utilize gene expression in matched tumor and control samples to study the relation between the expression of genes in adjacent normal control and tumor progression. TranNet infers a sparse weighted bipartite graph that defines a transition map from gene expression in control samples to gene expression in tumor together with predictors (potential regulators) of this transition. To our knowledge, TranNet is the first computational method to infer such regulation. We applied TranNet to the The Cancer Genome Atlas (TCGA) data from several solid tumors. Our results demonstrate that many predictors identified by TranNet correspond to known marker genes associated with regulation by the tumor microenvironment and are enriched in G-protein coupled receptor signaling, cell-to-cell communication, immune processes, and cell adhesion. In addition, targets of inferred predictors are enriched in pathways related to tissue remodelling (including the Epithelial-Mesenchymal Transition (EMT)), immune response, and cell proliferation. This indicates that genes identified by TranNet as predictors are makers and potential facilitators of environment-related regulation of tumor progression. The results obtained by TranNet provide a proof of the principle that control samples can be used to identify genes involved in the environment-mediated regulation of tumor progression and offer new insights into the relationship between tumor, normal, and the tumor environment. In addition, the set of predictors identified by TranNet will provide a valuable resource for future investigations. TranNet is freely available at https://github.com/ncbi/TranNet.

A Bayesian method to infer copy number clones from single-cell RNA and ATAC sequencing. Lucrezia Patruno (Dept. of Informatics, Systems and Communication, University of Milan-Bicocca, Italy), Salvatore Milite (Centre for Computational Biology, Human Technopole, Italy.), Riccardo Bergamin (Dept. of Mathematics and Geosciences, University of Trieste, Italy), Nicola Calonaci (Dept. of Mathematics and Geosciences, University of Trieste, Italy), Marco Antoniotti (Dept. of Informatics, Systems and Communication, University of Milan-Bicocca, Italy), Alex Graudenzi (Dept. of Informatics, Systems and Communication, University of Milan-Bicocca, Italy) and Giulio Caravagna (Dept. of Mathematics and Geosciences, University of Trieste, Italy). https://www.biorxiv.org/content/10.1101/2023.04.01.535197v1

Abstract. Single-cell RNA and ATAC sequencing technologies allow one to probe expression and chromatin accessibility states as a proxy for cellular phenotypes at the resolution of individual cells. A key challenge of cancer research is to consistently map such states on genetic clones, within an evolutionary framework. To this end we introduce CONGAS+, a Bayesian model to map single-cell RNA and ATAC profiles generated from independent or multimodal assays on the latent space of copy numbers clones. CONGAS+ can detect tumour subclones associated with aneuploidy by clustering cells with the same ploidy profile. The framework is implemented in a probabilistic language that can scale to analyse thousands of cells thanks to GPU deployment. Our tool exhibits robust performance on simulations and real data, highlighting the advantage of detecting aneuploidy from two distinct molecules as opposed to other single-molecule models, and also leveraging real multi-omic data. In the application to prostate cancer, lymphoma and basal cell carcinoma, CONGAS+ did retrieve complex subclonal architectures while providing a coherent mapping among ATAC and RNA, facilitating the study of genotype-phenotype mapping, and their relation to tumour aneuploidy.

The zero-agnostic copy number transformation model. Henri Schmidt (Princeton University), Palash Sashittal (Princeton University) and Benjamin Raphael (Princeton University). https://biorxiv.org/cgi/content/short/2023.04.10.536302v1

Abstract. Motivation: New low-coverage single-cell DNA sequencing technologies enable the measurement of copy number profiles from thousands of individual cells within tumors. From this data, one can infer the evolutionary history of the tumor by modeling transformations of the genome via copy number aberrations. A widely used model to infer such copy number phylogenies is the copy number transformation (CNT) model in which copy number aberrations are represented by events that alter the number of copies of contiguous segments of the genome. While the CNT distance between copy number profiles, i.e. the minimum number of events needed to transform one profile to another, can be computed efficiently, current methods rely on heuristics to compute copy number phylogenies under the CNT model.

Results: We introduce the resurrecting copy number transformation (RCNT) model, a simplification of the CNT model that allows the resurrection of zero copy number regions. We derive a closed form expression for the RCNT distance between two copy number profiles and show that, unlike the CNT distance, the RCNT distance forms a metric. We leverage the closed-form expression for the RCNT distance and a characterization of copy number profiles to derive polynomial time algorithms for two

natural relaxations of the small parsimony problem on copy number profiles. While the resurrection allowed in the RCNT model is not biologically realistic, we show on both simulated and real datasets that the RCNT distance is a close approximation to the CNT distance. Using our polynomial time algorithm for the (relaxed) small parsimony problem, we develop an algorithm, Breaked, for solving the large parsimony problem on copy number profiles. Finally, we demonstrate that Breaked outperforms existing methods for inferring copy number phylogenies on both simulated and real data.

MOGAT: An Improved Multi-Omics Integration Framework Using Graph Attention Networks. Raihanul Bari Tanvir (Florida International University), Md Mezbahul Islam (Florida International University), Masrur Sobhan (Florida International University), Dongsheng Luo (Florida International University) and Ananda Mohan Mondal (Florida International University). https://www.biorxiv.org/content/10.1101/2023.04.01.535195v1

Abstract. Integration of multi-omics data holds great promise for understanding the complex biology of diseases, particularly Alzheimer’s, Parkinson’s, and cancer. However, the integration is challenging due to the high dimensionality and com-plexity of the data. Traditional machine learning methods are not well-suited for handling the complex relationships between different types of omics data. Many models were proposed that utilize graph-based learning models to extract hidden representations and network structures from different omics data to enhance can-cer prediction, patient categorization, etc. The existing graph neural network-based (GNN-based) multi-omics approaches for cancer subtype prediction have three shortcomings: (a) Do not consider all types of omics data, (b) Fail to determine the relative significance of the neighboring nodes (in this case, samples or patients) when it comes to downstream analyses, such as subtype classification, patient stratification, etc., and (c) Use the same approach for generating initial graphs for different omics data. To overcome these shortcomings, we present MOGAT, a novel multi-omics integration approach, leveraging a graph attention network (GAT) model that incorporates graph-based learning with an attention mechanism. MOGAT utilizes a multi-head attention mechanism that can more ef-ficiently extract information for a specific sample by assigning unique attention coefficients to its neighboring samples. To evaluate the performance of MOGAT, we explored its capability via a case study of predicting breast cancer subtypes. Our results showed that MOGAT performs better than the state-of-the-art multi-omics integration approaches.

From Cell-Lines to Cancer Patients: Personalized Drug Synergy Prediction. Halil Ibrahim Kuru (Bilkent Universtiy), A. Ercument Cicek (Bilkent University) and Oznur Tastan (Sabanci University). https://www.biorxiv.org/content/10.1101/2023.02.13.528276v2

Abstract. Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell lines models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug presonses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. PDSP is built and available at https://github.com/hikuru/PDSP.

Accepted abstracts

The following abstracts have been accepted for poster presentation at RECOMB CCB 2023, and some have also been scheduled for a short talk (see the program for more details). Every poster should be formatted according to the main RECOMB guidelines.

Yifeng Tao, Xiaojun Ma, Drake Palmer, Russell Schwartz, Xinghua Lu and Hatice Osmanbeyoglu. Interpretable deep learning for chromatin-informed inference of transcriptional programs driven by somatic alterations across cancers
Yuchao Jiang. Canopy2: Tumor Phylogeny Inference Using Bulk DNA and Single-Cell RNA Sequencing
Pritika Ramharack, Kwandile Mbhele, Oelfah Patel and Rabia Johnson. Characterizing the Pharmacophoric Features of Bufadienolides from Drimia species as Na+, K+ ATPase Inhibitors: Advancing Cancer Therapeutics through Molecular Dynamic Simulations
Matteo Serra, Nicola Occelli, Frederic Lifrange, Mattia Rediti, Xiaoxiao Wang, Delphine Vincent, Ghizlane Rouas, Ligia Craciun, Denis Larsimont, Miikka Vikkula, François Duhoux, David Venet, Françoise Rothé and Christos Sotiriou. Deciphering microenvironment heterogeneity in luminal breast cancer by combining spatial transcriptomics, single cell RNA sequencing and image analysis
Yuichi Shiraishi. Precise characterization of somatic complex structural variations from paired long-read sequencing data with nanomonsv
Xin Lai. A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer
Valentina Giansanti, Oronzina Botrugno, Francesca Giannese, Dejan Lazarevic, Giovanni Tonon, Marco Antoniotti and Davide Cittaro. MOWGAN: a multiomic single cell data integration framework to dissect deep biology
Daria Ostroverkhova, Kathrin Tyryshkin, Igor Rogozin, Konstantin Shaitan, Polina Shcherbakova and Anna Panchenko. Mutated DNA Polymerase Epsilon Generates Distinct Mutational Landscape in Endometrial Cancer Genomes
Tobias Schmidt, Rainer Spang, Wolfram Gronwald and Michael Altenbuchinger. Explainable AI for the molecular subclassification of DLBCL
Luigi Laezza, Luigi Ferraro and Michele Ceccarelli. Embedding protein interactions with graph attention network: Proteins behaviours in anticancer drugs synergy
Lena Buck, Tobias Schmid, Wolfram Gronwald and Rainer Spang. Anomaly detection in mixed high dimensional molecular data
Anna A. Lobas, Amir Ata Saei, Roman A. Zubarev and Mikhail Gorshkov. Revealing the mechanisms of drug actions by computational proteomics: the everolimus case
Jacob Househam, Riccardo Bargamin, Salvatore Milite, Nicola Calonaci, Alice Antonello, Marc Williams, Vasavi Sundaram, Alona Sosinsky, Will Cross and Giulio Caravagna. Integrated quality control of allele-specific copy numbers, mutations and tumour purity from cancer whole genome sequencing assays
Pierre Martinez. Plasticity of rare breast cancers: detection pitfalls and opportunities via spatial transcriptomics
Aurora Maurizio, Anna Sofia Tascini and Marco Jacopo Morelli. SurfR: Surfing the cells' surfaceome
Ammar Naqvi, Brian Ennis, Ryan Corbett, Aditya Lahiri, Zhuangzhuang Geng, Run Jin, Komal Rathi, Karina Conkrite, Krutika Gaonkar, Priyanka Seghal, Katharina Hayer, Adam Kraya, Jessica Foster, Peter Madsen, Andrei Thomas-Tikhonenko, Phillip Storm, Adam Resnick and Jo Lynne Rokita. Recurrent splicing aberrations in pediatric high-grade gliomas target known functional sites in known oncogenic factor CLK1
Mustafa Kaya and Dilek Colak. Integrated transcriptomic analysis reveals blood-based gene signature with diagnostic and prognostic potential for patients with breast cancer
Andreas Lösch and Y. Linda Hu. Mutual Hazard Networks: Application of Graphical Models for Cancer Progression Modelling
Nisha Chaudhary, Md Imam Faizan, Aakash Rao, Arpita Rai, Jeyaseelan Augustine, Akhilanand Chaurasia, Deepika Mishra, Akhilesh Chandra, Rintu Kutum and Tanveer Ahmad. Grade-level classification of oral squamous cell carcinoma (OSCC) from digital pathology using ensemble deep learning algorithms
Carla Castignani, Jonas Demeulemeester, Elizabeth Larose Cadieux, Nnennaya Kanu, Robert E. Hynds, David R. Pearce, Charles Swanton and Peter Van Loo. CREDAC: Copy number-based Reference-free Expression Deconvolution Analysis of Cancers
William Yashar, Garth Kong, Jake Vancampen, Brittany Curtiss, Daniel Coleman, Lucia Carbone, Galip Yardimci, Julia Maxson and Theodore Braun. GoPeaks: histone modification peak calling for CUT&Tag
Joseph Estabrook, William Yashar, Hannah Holly, Julia Somers, Olga Nikolova, Ozgun Babur, Theodore Braun and Emek Demir. Predicting transcription factor activity using prior biological information
Nicola Calonaci, Salvatore Milite, Stefano Scalera, Marcello Maugeri-Saccà and Giulio Caravagna. Measuring the epistatic effect of mutations and aneuploidy from targeted sequencing panels
Giovanni Santacatterina, Nicola Calonaci, Riccardo Bergamin, Leonardo Egidi and Giulio Caravagna. biPOD: a bayesian inference based method for population dynamics
Riccardo Bergamin, Salvatore Milite, Elena Buscaroli, Arianna Tasciotti, Nicola Calonaci, Azad Sadr, Irene Baravelli, Edith Natalia Garcia Villegas, Fabio Anselmi, Alberto D'Onofrio and Giulio Caravagna. A Bayesian model to deconvolve mutational signatures in a semi-supervised way
Intekhab Hossain, John Quackenbush, Viola Fanfani and Rebekka Burkholz. Biologically informed NeuralODEs for genome-wide regulatory dynamics in breast cancer progression
Ziyun Guang, Matthew Smith-Erb and Layla Oesper. A Weighted Distance-based Approach for Deriving Consensus Tumor Evolutionary Trees
Seyma Unsal Beyge and Nurcan Tuncbag. Functional Stratification of Cancer Drugs Through Integrated Network Similarity
Michael Huttner. Mechanical Turk for digital pathology
April Sagan and Hatice Osmanbeyoglu. STAN, a computational framework for inferring spatially informed transcription factor activity networks
Bengi Ruken Yavuz, Chung-Jung Tsai, Ruth Nussinov and Nurcan Tuncbag. Pan-Cancer Clinical Impact of Latent Drivers from Double Mutations
Mg Hirsch, Soumitra Pal, Cenk Sahinalp, Erin Molloy, Chi-Ping Day and Teresa Przytycka. Gene Expression Evolution in Tumors