12/6/2013

Post date: Dec 9, 2013 9:05:09 PM

Speaker: Yingying Wei, PhD Candidate, Dept of Biostatistics, JHU

Topic: Integrative statistical models for genomic signal detection and ultra-high-dimensional prediction (Practice of Job Talk)

Big data offer challenges in terms of legitimate inferences, robust models, scalable computations, and scientific interpretability. Importantly, they also provide unprecedented opportunities to conduct data driven scientific discoveries. In this talk, we use two examples to illustrate how statistics can help ferreting out scientific discoveries from big data. In the first one, we propose a correlation motif approach for integrative analysis of multiple high-dimensional genomic datasets. The approach adopts a flexible Bayesian hierarchical mixture model to capture the latent correlation structures embedded in the data and substantially improves signal detection for low-signal-to-noise ratio data. The application is illustrated by detection of allele-specific protein-DNA binding from ChIP-seq data, which often suffers from low statistical power due to the limited number of sequence reads mapped to heterozygote SNPs, as well as differential gene expression with only a small number of replicates. In the second one, we discuss the new challenge arising from high-throughput biology of predicting one type of ultra-high dimensional genomic profile from another. We demonstrate the approach by predicting whole genome DNA methylation landscape with Exon array data. While these methods are based on applications in genomics, they can also be applied to a broader class of big data problems beyond genomics.