Methodology in Statistical Genetics/Genomics, Causal Inference, and Machine/Deep Learning
Note: # denotes (co)--senior/corresponding author, and * denote (co)--first author.
Chen, Y., Chen, Y., Liang, J., Zhang, X., Wang, C., Zhang, X., Zhang, S., Zhou, X., Ye, W., Feng, R., Cai, Y., Lin, X., Zhang, Z., Ji, M., Cui, Q., Zhou, Y., Li, J., Xu, J., Ye, X., Feng, Q., Tang, M., Zeng, M.-S., Zeng, Y.-X., Liu, Z.#, Zhai, W#., Liu, J.#, & Xu, M.# (2026). EBV strain interacts with host HLA to drive nasopharyngeal carcinoma risk. Nature (In Press).
Yao, M., Wang, A., Li, X., & Liu, Z.# (2026). Mendelian randomization methods for causal inference: Estimands, identification, and inference. Statistics in Medicine 45, no. 1-2 (2026): e70394, https://doi.org/10.1002/sim.70394. also available at: https://arxiv.org/abs/2509.11519. (Invited Tutorial in Biostatistics).
We coined the term "causal genomics" in this paper.Yao, M., Tian, P., Li, X., Bian, S., Wang, G., Gu, Y., Navas-Acien, A., Vardarajan, B. N., Belsky, D. W., Miller, G. W., Baccarelli, A. A., & Liu, Z.#, the Alzheimer’s Disease Neuroimaging Initiative (2026). CoxMDS: Multiple Data Splitting for High-dimensional Mediation Analysis with Survival Outcomes in Epigenome-wide Studies. Briefings in Bioinformatics, Volume 27, Issue 1, January 2026, bbaf730, https://doi.org/10.1093/bib/bbaf730
Meng Y, Yao M, Han T, Zhao H, Liu Z.#, Ma B#. (2026). Inferring directed gene regulatory networks from single-cell ribonucleic acid sequencing data via multi-view contrastive learning. Engineering Applications of Artificial Intelligence. 2026;172:114350. https://doi.org/10.1016/j.engappai.2026.114350.
Yao, M. and Liu, Z. (2025) An introduction to causal inference methods with multi-omics data. Current Protocols. DOI:10.1002/cpz1.70168.
Huang T.-J. , Liu, Z., McKeague I. W. (2025). Post-selection inference for high-dimensional mediation analysis with survival outcomes. Scandinavian Journal of Statistics, 1–21. 52(2), 756–776.
Wang, X., Liu, J., Hu, S. S., Liu, Z., Lu, H., & Liu, L., for the Alzheimer’s Disease Neuroimaging Initiative. (2025). HILAMA: High-dimensional multi-omics mediation analysis with latent confounding. BMC Medical Research Methodology, 25(1), 239.
Yang, H. Liu, Z., Wang, R., Lai, E., Schwartz, J., Baccarelli, A., Huang, Y., Lin, X. (2025) Causal mediation analysis for integrating exposure, genomic and phenotype data. Annual Review of Statistics and Its Application (Invited Review Paper), https://doi.org/10.1146/annurev-statistics-040622-031653.
Zhang, S.*, Zhou, Y.*, Liu, Z.*, Wang, Y.*, Zhou, X., Chen, H., Zhang, X., Chen, Y., Feng, Q., Ye, X., Xie, S., Zeng, M.-S., Zhai, W., Zeng, Y.-X., Cao, S., Li, G., & Xu, M. (2025) Immunosequencing identifies signatures of anti-tumor T cell responses for early detection of nasopharyngeal carcinoma. Cancer Cell. https://doi.org/10.1016/j.ccell.2025.04.009. (co-first author with equal contributions)
Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SLR, Kelly T, Konigsberg I, Kooperberg C, Kral BG, Li C, Loos RJF, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari H, Vasan RS, Wang Z, Yanek LR, Yu B; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z#, Liu Z,# Lin X.# (2025). A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. Nature Computional Science (2025). https://doi.org/10.1038/s43588-024-00764-8
Wei, Q., Cui, M., Liu, Z., Liu, Z., Feng, G., Li, Y., Christiani, D., Li, L., Wang, J., Hao, Y., & Wei, Y. (2025). Integrating statistical design and inference: A roadmap for robust and trustworthy medical AI. The Innovation Medicine. https://doi.org/10.59717/j.xinn-med.2025.100145.
Kang, H., Guo, Z., Liu, Z., Small, D. (2025) Identification and inference with invalid instruments. Annual Review of Statistics and Its Application (Invited Review Paper), Journal Link
Liu Q.*, Wang, Z., Li, X., Ji, X., Zhang, L., Liu, L.#, and Liu Z#. (2024). DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation. International Conference on Machine Learning, acceptance rate 27.5%. Python Code.
Chuwdhury, G., Guo, Y.*, Cheung, C., Lam, K., Kam, N., Liu, Z.#, Dai, W.#, (2024) ImmuneMirror: A Machine Learning-based Integrative Pipeline and Web Server for Neoantigen Prediction. Briefings in Bioinformatics, 25(2), bbae024. Web Server.
Chen, Y., Lam, K. F., Liu, Z. (2024). High-dimensional Feature Screening for Nonlinear Associations With Survival Outcome Using Restricted Mean Survival Time. Stat 13, (2), e673.
Wang, L., Babushkin, N., Liu, Z., Liu, X.# (2024). Trans-eQTL mapping in gene sets identifies network effects of genetic variants. Cell Genomics, 4(4).
Xu, M.*, Feng, R.*, Liu, Z.*, Zhou, X.*, Chen, Y., Cao, Y., Valeri, L. Li, Z., Liu, Z., Cao, S.,Liu, Q., Xie, S., Chang E., Jia, W., Shen, J., Yao, Y., Cai, Y., Zhegn, Y., Zhang, Z., Huang, G., Ernberg, I., Tang, M., Ye, W., Adami, H., Zeng, Y., Lin, X. (2024). Host genetic variants, Epstein-Barr virus subtypes and the risk of nasopharyngeal carcinoma: Assessment of interaction and mediation. Cell Genomics. (co-first author with equal contributions) DOI:https://doi.org/10.1016/j.xgen.2023.100474. (Cover paper on Feb. 14, 2024).
Ye, T., Liu, Z., Sun, B., Tchetgen Tchetgen, E., (2024). GENIUS-MAWII: For Robust Mendelian Randomization with Many Weak Invalid Instruments. Journal of the Royal Statistical Society: Series B (Statistical Methodology) qkae024.
Yao, M.*, Miller, G.W., Vardarajan, B. N., Baccarelli, A. A., Guo, Z.#, and Liu, Z.# (2024). Deciphering proteins in Alzheimer's disease: A new Mendelian randomization method integrated with AlphaFold3 for 3D structure prediction. Cell Genomics, 4(12), 100700.
"All valid instruments are alike; each invalid instrument is invalid in its own way"-- Anna Karenina Principle
We also first coin the term "xMR" (omics MR) in this paper.
Media Coverage:
https://medicalxpress.com/news/2024-12-pipeline-insights-alzheimer-mechanisms-potential.html
https://www.miragenews.com/new-tool-finds-alzheimers-protein-biomarkers-3d-1371125/
https://www.sciencedaily.com/releases/2024/12/241204113637.htm
https://magazine.columbia.edu/article/how-artificial-intelligence-changing-biomedical-research
Sun, B., Liu, Z., Tchetgen Tchetgen, E., (2023). Semiparametric Efficient G-estimation with Invalid Instrumental Variables. Biometrika. 110(4), 953-971.
Zhou, Y.*, Wang, W.*, Hu, T., Tong, J., Liu, Z#. (2023) Causal mediation analysis for an ordinal outcome with multiple mediators. Structural Equation Modeling-A Multidisciplinary Journal, 31(2), 205-216
Liu, Y., Liu, Z., Lin, X. Ensemble methods for testing a global null. (2023) Journal of the Royal Statistical Society: Series B (Statistical Methodology), qkad131, https://doi.org/10.1093/jrsssb/qkad131
Yang, J.*, Xu, Y.*, Yao, M.*, Wang G., Liu, Z.#. (2023). ERStruct: A Python Package for Inferring the Number of Top Principal Components from Whole Genome Sequencing Data. BMC Bioinformatics. (Yang, J. was a summer RA as an undergrad). Python code.
Tian, P*, Yao, M*, Huang T, Liu Z#. (2022). CoxMKF: A knockoff filter for high-dimensional mediation analysis with a survival outcome in epigenetic studies. Bioinformatics. 38(23), 5229-5235.
Liu, Z. #, Shen, J., Barfield, R., Schwartz, J., Baccarelli, A., Lin, X., (2022). Large-Scale hypothesis testing for causal mediation effects with applications in genome-wide epigenetic studies. Journal of the American Statistical Association, 117(537), 67-81, DOI: 10.1080/01621459.2021.1914634. Top 10 Most Cited Papers in JASA in 2023.
Xu, S*., Liu, L.# and Liu, Z.# (2022) DeepMed: Semiparametric causal mediation analysis with debiased deep learning. The Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), 35, pp. 28238-28251. (Acceptance rate: 25.6% with a total of 10,411 full paper submissions.) R package. the arxiv link.
Wang W.W.*, Lu, J., Tong, T., Liu, Z. (2022). Debiased Learning and Forecasting of First Derivative. Knowledge-Based Systems. DOI: https://doi.org/10.1016/j.knosys.2021.107781. (IF=8.038, Computer Science-Artificial Intelligence 16 out of 139).
Xu, Y.*, Liu, Z.#, Yao, J., (2022). ERStruct: An eigenvalue ratio approach to inferring population structure from whole genome sequencing data. Biometrics, 79, 891–902. https://doi.org/10.1111/biom.13691
Tian, P*, Hu, Y, Liu, Z.# and Zhang, Y.# (2022). Grace-AKO: A Novel and Stable Knockoff Filter for Variable Selection Incorporating Gene Network Structures. BMC Bioinformatics 23, 478.
Liu, Z., Ye, T., Sun, B., Schooling, M., Tchetgen Tchetgen, E., (2022). Mendelian randomization mixed-scale treatment effect robust identification and estimation for causal inference. Biometrics, 79, 2208–2219.
Xu, S.*, Wang P., Fung, W.K., Liu, Z.#, (2022). A Novel Penalized Inverse-Variance Weighted Estimator for Mendelian Randomization with Applications to COVID-19 Outcomes. Biometrics, 79, 2184–2195.
Wang, A*., Liu, W*, Liu, Z.#, (2022). A Two-Sample Robust Bayesian Mendelian Randomization Method Accounting for Linkage Disequilibrium and Idiosyncratic Pleiotropy with Applications to the COVID-19 Outcome. Genetic Epidemiology 46, 159– 169. https://doi.org/10.1002/gepi.22445. (one of top 10 most-cited papers among work published between 1 January 2022 - 31 December 2023 in Genetic Epidemiology.)
Wang, W.W.*, Yu, P., Zhou, Y., Tong, T., Liu, Z., (2021). Equivalence of two least-squares estimators for indirect effects. Current Psychology. DOI: https://doi.org/10.1007/s12144-021-02034-6.
Wang, W.W.*, Xu, J., Schwartz, J., Baccarelli, A., Liu, Z.#, (2021). Causal mediation analysis with latent subgroups. Statistics in Medicine. 40( 25): 5628– 5641. DOI: https://doi.org/10.1002/sim.9144.
Liu, W.*, Xu, Y.*, Wang, A*., Huang, T.#, Liu, Z.#, (2021). The Eigen Higher Criticism and Eigen Berk-Jones Tests for Multiple Trait Association Studies based on GWAS Summary Statistics. Genetic Epidemiology, 46, 89– 104. https://doi.org/10.1002/gepi.22439
(one of top 10 most-cited papers among work published between 1 January 2022 - 31 December 2023 in Genetic Epidemiology.)
Liu, Z., Barnett, I., Lin, X., 2020. A comparison of principal component methods between multiple phenotype regression and multiple SNP regression in genetic association studies, The Annals of Applied Statistics, 14(1), pp.433-451.
Luo, X., Schwartz, J., Baccarelli, A., Liu, Z. # , 2020. Testing cell-type-specific mediation effects in genome-wide epigenetic studies, Briefings in Bioinformatics, 22(3), bbaa131.
Liu, Z. and Lin, X., 2019. A geometric perspective on the power of principal component association tests in multiple phenotype studies, Journal of the American Statistical Association, 114(527), pp.975-990.
Liu, Z. and Lin, X., 2018. Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics, 74(1), pp.165-175. (one of top 20 most downloaded paper in 2017-2018 in Biometrics )