Seunggeun (Shawn) Lee

John G Searle Assistant Professor
Department of Biostatistics 
University of Michigan
1415 Washington Heights 
Ann Arbor, Michigan 48109-2029 

E-mail : 
Google Scholar : link 

Our group has focused on developing statistical and computational methods for large-scale genetics and genomics data analysis. We are currently working on the following areas:

  • Rare-variant association analysis and meta-analysis

  • With the development of sequencing technologies, sequencing-based association studies are increasingly being conducted to identify rare variants associated with complex traits. Successful applications of sequencing technology to studies of complex traits require powerful and efficient statistical and computational methods. We developed methods for rare variant association analysis, including SKAT and SKAT-O tests. These methods are highly cited and have been established as an industry standard. We extended these methods to meta-analysis, family design, longitudinal studies, and gene-environment (GxE) interaction test. We also developed practical moment-based and resampling-based adjustment methods for binary traits to obtain accurate p-values when the case-control ratios are unbalanced.
  • Phenome-wide association studies 
PheWAS uses electronic health record (EHR) to phenotype thousands of diseases status and carries out genetic association analysis for all the phenotypes. We have developed fast and accurate compuation methods for binary phenotypes. We are currently developing a method that can adjust for kinship even when the sample size is very large. 
  • High-dimensional data analysis

  • Principal component analysis (PCA) is a powerful tool to explore characteristics of high dimensional data. In genome-wide association studies (GWAS), it is widely used to adjust for the confounding effect of population stratification. We have developed practical tools for GWAS and studied the theoretical properties of PCA in high dimensional settings. We are expanding these results to other high-dimensional methods including surrogate variable analysis and partial least squares.