Kyu Ha Lee, Ph.D.

Assistant Professor of Integrative Genomic Epidemiology


Harvard T.H. Chan School of Public Health

Department of Nutrition

Department of Epidemiology

Department of Biostatistics


655 Huntington Avenue

Building II, Room 211A

Boston, MA 02115

klee@hsph.harvard.edu

Research Interests

Bayesian analysis, Multivariate analysis, Survival analysis, Microbiome data analysis, High-dimensional data analysis




Education and Training


Ph.D., Statistics, University of Missouri, Columbia, MO, 8/2006 - 7/2011


Postdoctoral Research Fellow, Biostatistics, Harvard School of Public Health, MA, 8/2011 - 12/2013




Academic Appointments


1/2014 - 6/2015    Research Associate, Department of Biostatistics, Harvard School of Public Health, MA


7/2015 - 6/2019 Assistant Investigator, The Forsyth Institute, MA


7/2015 - 6/2019 Biostatistics Consultant, Harvard Catalyst, MA


1/2016 - 6/2019    Instructor, Harvard School of Dental Medicine, MA


7/2019 -                Assistant Professor, Department of Nutrition, Harvard T.H. Chan School of Public Health, MA




Honors and Distinctions


2009-11   Graduate Student Travel Award, Department of Statistics, University of Missouri-Columbia


2010    Graduate Student Association Travel Award, University of Missouri-Columbia


2010    Graduate Student Travel Award, Southern Regional Council on Statistics Summer Research Conference


2011        Graduate Professional Council Travel Scholarship, University of Missouri-Columbia


2011        Winter Workshop Junior Researcher Travel Award, University of Florida


2013        David P. Byar Young Investigator Award (1st place), American Statistical Association: Biometrics Section




Research Support


Active Grants


NIH R01GM126257 (Starr / Lee): Bayesian Multivariate 3D Spatial Modeling for Microbiome Image Analysis

Role: Principal Investigator


NIH R03DE027486 (Lee): Multivariate Bayesian Variable Selection for High-Dimensional Oral Microbiome Data

Role: Principal Investigator


NIH R21DE026872 (Starr / Lee): Bayesian Multivariate Image Analysis for Studying Oral Microbiome Biogeography

Role: Principal Investigator


HSPH/CAP-CVD (Sun / Walther): Interdisciplinary Projects on Metabolic Risk Factors of Cardiovascular Disease

Role: Co-Investigator


NIH R01ES022981 (Sun): Environmental Obesogens and Weight Change in the POUNDS LOST Trial

Role: Co-Investigator


NIH P01HD103133 (Seage / Chadwick): Pediatric HIV/AIDS Cohort Study (PHACS) 2020

Role: Co-Investigator


NIH R01DK126698 (Sun): Human Gut Microbiome and Incident Diabetes Risks in U.S. Populations

Role: Co-Investigator


NIH R01DK119268 (Sun): Metabolomics Signatures Underlying Diet, Lifestyle and Gut Microbiota for Diabetes

Role: Co-Investigator



Completed Grants


NIH UL1TR000170 (Nadler): Harvard Clinical and Translational Science Center

Role: Biostatistician (7/2015 – 2019)


NIH R01CA181360 (Haneuse): Clustered Semi-competing Risks Analysis in Quality of End-of Life Care Studies

Role: Site Principal Investigator (7/2015 – 1/2018); Research Associate (1/2014 – 6/2015)


NIH U01HD052102 (Seage): Pediatric HIV/AIDS Cohort Study DOC: Oral Health in AMP Subjects (Microbiome)

Role: Biostatistician (7/2015 – 1/2016)


NIH P01CA134294 (Lin / Dominici): Statistical Informatics for Cancer Research

Role: Research Associate (1/2014-6/2015); Postdoctoral Research Fellow (8/2011-12/2013)


EPA RD83479801 (Koutrakis): Air Pollution Mixtures: Health Effects Across Life Stages

Role: Research Associate (1/2014-6/2015); Postdoctoral Research Fellow (8/2011-12/2013)


NIH R01ES012044 (Coull): Analysis of High-Dimensional Environmental Health Data

Role: Postdoctoral Research Fellow (8/2011-7/2012)


NIH R01ES012054 (Dominici): Statistical Methods for Population Health Research on Chemical Mixtures

Role: Postdoctoral Research Fellow (8/2011-7/2012)




Publications (Google scholar


1. Lee KH, Chakraborty S, and Sun J (2011) Bayesian variable selection in semiparametric proportional hazards model for high dimensional survival data. The International Journal of Biostatistics, Volume 7, Issue 1, pages 1-32.


2. Lee KH, Haneuse S, Schrag D, and Dominici F.(2014) Bayesian semi-parametric analysis of semi-competing risks data: investigating hospital readmission after a pancreatic cancer diagnosis. Journal of the Royal Statistical Society: Series C, Volume 64, Issue 2, pages 253-273 (This paper won 2013 David P. Byar Young Investigator Award )

Appendix


3. Lee KH, Chakraborty S, and Sun J (2015) Survival prediction and variable selection with simultaneous shrinkage and grouping priors. Statistical Analysis and Data Mining, Volume 8, Issue 2, pages 114-127.

Appendix


4. Tanuma J, Lee KH,  Haneuse S, Matsumoto S, Nguyen TD, Nguyen THD, Do DC, Pham TTT, Nguyen VK, and Oka S (2016) Incidence of AIDS-defining opportunistic infections and mortality during antiretroviral therapy in a cohort of adult HIV-infected individuals in Hanoi 2007-2014. PLoS ONE 11(3): e0150781. 


5. Lee KH, Dominici F, Schrag D, Haneuse S (2016) Hierarchical models for semi-competing risks data with application to quality of end-of-life care for pancreatic cancer. Journal of the American Statistical Association, Volume 111, Issue 515, pages 1075-1095.

Appendix


6. Haneuse S and Lee KH (2016) Semi-competing risks data analysis: accounting for death as a competing risk when the outcome of interest is non-terminal event. Circulation: Cardiovascular Quality and Outcomes, Volume 9, Issue 3, pages 322-331.


7. Abreu MH, Lee, KH, Luquetti D, Starr JR (2016) Temporal trend in the birth prevalence of cleft lip and/or cleft palate in Brazil, 2000-2013. Birth Defects Research (Part A), Volume 106, Issue 9, pages 789-792.


8. Lee KH, Tadesse MG, Baccarelli AA, Schwartz J, and Coull BA (2017) Multivariate Bayesian variable selection exploiting dependence structure among outcomes: application to air pollution effects on DNA methylation. Biometrics, Volume 73, Issue 1, pages 232-241.

Appendix


9. Abreu MH, Resende VLS, Lee KH, Matta-Machado ATG, Starr JR (2017) Regional differences in infection control conditions in a sample of primary health care services in Brazil, Cadernos de Saúde Pública (Reports in Public Health), 33(11): e00072416.


10. Lee KH, Chakraborty S, and Sun J (2017) Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior. Computational Statistics and Data Analysis, Volume 112, pages 1-13.


11. Lee KH, Rondeau V, Haneuse S (2017) Accelerated failure time models for semi-competing risks data in the presence of complex censoring. Biometrics, Volume 73, Issue 4, pages 1401-1412.

Appendix


12. Atia L, Bi D, Sharma Y, Mitchel JA, Gweon B, Koehler S, DeCamp S, Lan B, Kim JH, Hirsch R, Pegoraro A, Lee KH, Starr JR, Weitz DA, Martin A, Park J-A, Butler JP, Fredberg JJ (2018) Geometrical constraints during epithelial jamming. Nature Physics, Volume 14, Issue 6, pages 613-620.


13. Starr JR, Huang Y, Lee KH, Murphy CM, Moscicki A, Shiboski CH, Ryder MI, Yao TJ, Faller L, Van Dyke RB, Paster BJ (2018) Oral microbiota in youth with perinatally acquired HIV infection. Microbiome, 6(1):100.


14. Liu S, Bobb J, Lee KH, Gennings C, Claus Henn B, Bellinger D, Austin C, Schnaas L,Tellez-Rojo M, Hu H, Wright RO, Arora M, and Coull BA, (2018) Lagged kernel machine regression foridentifying time windows of susceptibility to exposures of complex metal mixtures. Biostatistics, Volume 19, Issue 3, pages 325-341.

Appendix


15. Bassir SH, Kholy KE, Chen C-Y, Lee KH, Intini G, (2019) Outcome of early dental implantplacement versus other dental implant placement protocols: a systematic review and meta-analysis. Journal of Periodontology, Volume 90, Issue 5, pages 493-506.


16. Koch G, Hamilton A, Wang K, Herschdorfer L, Lee KH, Gallucci G, Friedland B (2019) Dimensional accuracy of cone beam computed tomography with varying angulation of the jaw to the X-ray beam. Dentomaxillofacial Radiology, Volume 48(4).


17. Alvares D, Haneuse S, Lee C, Lee KH (2019) SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data. R Journal, Volume 11(1), pages 376-400.

Appendix


18. Green DR, Schulte F, Lee KH, Pugach MK, Hardt M, Bidlack FB (2019) Mapping the tooth enamel proteome and amelogenin phosphorylation onto mineralizing porcine tooth crowns. Frontiers in Physiology, Volume 10, pages 925.


19. Goldstein JM, Valido A, Lewandowski J, Walker RG, Mills MJ, Messemer KA, Besseling P, Lee KH, Lee RT, Wagers AJ (2019) Variation in zygotic CRISPR/Cas9 gene editing outcomes generates novel reporter and deletion alleles at the Gdf11 locus. Scientific Reports, Volume 9, Issue 1, 18613.


20. Lee KH, Coull BA, Moscicki AB, Paster BJ, Starr JR (2020) Bayesian variable selection for multivariate zero-inflated models: application to microbiome count data. Biostatistics, Volume 21, Issue 3, pages 499-517.

Appendix


21. Kim JH, Pegoraro AF, Das A, Koehler S, Ujwary SA, Lan B, Mitchel JA, Atia L, He S, Wang K, Bi D, Zaman M, Park J-A, Butler JP, Lee KH, Starr JR, Fredberg JJ (2020) Unjamming and collective migration in MCF10A breast cancer cell lines. Biochemical and Biophysical Research Communications, Volume 521, Issue 3, pages 706-715.


23. Li Y, Seo S, Lee KH (2021) Bayesian analysis of grouped survival data with adaptive time partition. Journal of Statistical Computation and Simulation, Volume 91, Issue 14, 2937-2952.

Appendix


22. Li J, Li Y, Ivey KL, Wang D, Wilkinson JE, Franke A, Lee KH, Chan AT, HuttenhowerC, Hu FB, Rimm EB, Sun Q (2022) Interplay between diet and gut microbiome, and circulating concentrations of Trimethylamine N-oxide: findings from a longitudinal cohort of U.S. men. Gut, Volume 71, Issue 4, 724-733.

 

24. Haneuse S, Schrag D, Dominici, F, Normand S-L, Lee KH (2022) Measuring performance for end-of-life care: A Bayesian decision-theoretic approach. Annals of Applied Statistics, Volume 16, Issue 3, 1586-1607.


25. Reeder H, Lee KH, Haneuse S (2023) Characterizing quantile-varying covariate effects under the accelerated failure time model. Biostatistics, kxac052.


26. Wang F, Tessier A-J, Liang L, Wittenbecher C, Haslam D, Eliassen AH, Rexrode KM, Tobias DK, Li J, Zeleznik O, Stampfer MJ, Grodstein F, Martnez-Gonzlez MA, Salas-Salvad J, Clish C, Lee KH, Sun Q, Hu FB, Guasch-Ferr M. (2023) Plasma metabolomic profiles associated with mortality and longevity in a prospective analysis of 13,512 individuals. Nature Communications, Volume 14, Issue 1, page 5744.


27. Yang H, Li J, Zhu L, Wang B, Li Y, Ivey KL, Lee KH, Eliassen H, Qi Q, Chan AT, Huttenhower C, Rimm EB, Hu FB, Sun Q. (2023) The interplay between diet, circulating indolepropionate level, and cardiometabolic health in US populations. Gut, Volume 72, Issue 12, pages 2260-2271. 


28. Bui LP, Pham TT, Wang F, Chai B, Sun Q, Hu FB, Lee KH, Guasch-Ferre M, Willett WC. (2024) Planetary health diet index and risk of total and cause-specific mortality in three prospective cohorts. The American Journal of Clinical Nutrition, in press.


29. Reeder H, Haneuse S, Lee KH. (2024) Group lasso priors for Bayesian accelerated failure time models with left-truncated and interval-censored data. Statistical Methods in Medical Research, in press.


Majumder S, Coull BA, Mark Welch J, La Riviere PJ, Dewhirst FE, *Starr JR, *Lee KH. Multivariate cluster point process to quantify and explore multi-entity configurations: Application to biofilm image data. arXiv


*Co-senior authors 




Software


"SemiCompRisks" v3.4

• R-package for hierarchical models for parametric and semi-parametric analyses of semi-competing risks data

• Parametric and semi-parametric analyses of semi-competing risks/univariate survival data. The package contains implementations of hierarchical proportional hazards and accelerated failure time models for independent data and cluster-correlated data.

Download (CRAN)

Reference manual, Vignettes, Manuscript


"mBvs" v1.5

• R-package for Bayesian variable selection methods for multivariate data

• Bayesian variable selection methods for data with multivariate responses and multiple covariates. The package contains implementations of multivariate Bayesian variable selection methods for continuous data and zero-inflated count data.

Download (CRAN)

Reference manual, Examples (GitHub)


"psbcGroup" v1.7

• R-package for penalized parametric and semi-parametric Bayesian survival models with shrinkage and grouping priors

• Penalized parametric/semi-parametric Bayesian proportional hazards and accelerated failure time models with shrinkage lasso priors (ordinary lasso, elastic-net, fused lasso, group lasso) can be implemented to analyze survival data with high-dimensional covariates.

Download (CRAN)

Reference manual


"MCPP" v0.9

• R-package for multivariate cluster point process models

Download (GitHub)

Manuscript


Journal Reviewers

• Annals of Applied Statistics

• Biometrical Journals

• Bioinformatics

• Biometrics

• Biostatistics

• BMC Genomics

• BMJ Open

• Clinical Epidemiology

• Communications in Statistics

• Computational Statistics and Data Analysis

• Epidemiology

• Journal of Applied Statistics

• Journal of Multivariate Analysis

• Journal of Oral Microbiology

• Journal of Statistical Computation and Simulation

• Journal of Statistical Planning and Inference

• Journal of the Royal Statistical Society

• Lifetime Data Analysis

• Nature

• Stat

• Statistics and Probability Letters

• Statistical Methods in Medical Research

• Statistics in Medicine

• The Indian Journal of Statistics