Anshul Kundaje, Stanford University
Abstract:
We present interpretable deep learning approaches to address three key challenges in integrative analysis of functional genomic data. (1) Data denoising: Data quality of functional genomic data is affected by a variety of experimental parameters. Making accurate inferences from chromatin profiling experiments that involve diverse experimental parameters is challenging. We introduce a simple convolutional denoising algorithm to learn a mapping from suboptimal to high-quality datasets that overcomes various sources of noise and variability, substantially enhancing and recovering signal when applied to low-quality chromatin profiling datasets across individuals, cell types, and species. Our method has the potential to improve data quality at reduced costs. (2) Data imputation: It is largely infeasible to perform 100s of genome-wide assays targeting diverse transcription factors and epigenomic marks in 100s of cellular contexts due to cost and material constraints. We have developed multi-task, multi-modal deep neural networks to predict chromatin marks and in vivo binding events of 100s of TFs by integrating regulatory DNA sequence with just two assays namely ATAC-seq (or DNase-seq) and RNA-seq performed in a target cell type of interest. We train our models on large reference compendia from ENCODE/Roadmap Epigenomics and obtain high prediction accuracy in new cellular contexts thereby significant expanding the context-specific annotation of the non-coding genome.(3) Decoding the context-specific regulatory architecture of the genome: Finally, we develop efficient interpretation engines for extracting predictive and biological meaningful patterns from integrative deep learning models of TF binding and chromatin accessibility. We obtain new insights into TF binding sequence affinity models (e.g. significance of flanking sequences and fusion motifs), infer high-resolution point binding events of TFs, dissect higher-order cis-regulatory sequence grammars (including density and spatial constraints), learn chromatin architectural features correlated with chromatin marks, unravel the dynamic regulatory drivers of cellular differentiation and score the regulatory influence of non-coding genetic variants.
Suchi Saria, Johns Hopkins University
Abstract: TBD
Cheng Soon Ong, Data61, Canberra
Abstract:
Spearman's correlation measures the association between ranked lists. Given a set of ranked lists, we study two tasks: aggregating the set of ranks into one single ranked list, and computing the agreement of the lists as we traverse it. Applications include the analysis of the stability of feature selection and integration of various sources of information. This is illustrated with two examples respectively: We study the stability of identifying variations in GWAS by considering replication studies. In another study, we aggregate genomic distance, 3D associations, and literature information to findpromising disease associated variations. It turns out that these problems can be tackled by considering a multivariate Spearman's correlation.