I am a final year PhD candidate in the Department of Statistics under the supervision of Dr. Jessica Gronsbell and Dr. Lei Sun at the University of Toronto. My current thesis focuses on developing semi-supervised methods for various sources of health data, including electronic health records and genetic data.
Previously I spent a wonderful summer as a Data Science and Machine Learning Intern at insitro in South San Francisco. I was also a CANSSI STAGE trainee and a participant in the Toronto General Hospital Multi-Organ Transplant Student Research Training Program.
Gao, J., Gronsbell, J. Reliable fairness auditing with semi-supervised inference. [arXiv][R-package][Code]
Gronsbell, J., Gao, J., Shi, Y., McCaw Z.R., Cheng, D. Another look at inference after prediction. [arXiv][Code]
McCaw, Z.R., Gao, J., Dey, R., Tucker, S., Zhang, Y., insitro Research Team, Gronsbell, J., Li, X., Fox, E., O’Dushlaine, C., Soare, T.W. (2025). A Scalable Framework for Identifying Allelic Series from Summary Statistics. American Journal of Human Genetics. [Paper][R-package]
McCaw, Z.R., Gao, J., Lin, X., Gronsbell, J. (2024). Synthetic surrogates improve power for genome-wide association studies of partially missing phenotypes in population biobanks. Nature Genetics. [Interview][Paper][R-package][Code]
Gao, J.#, Bonzel, C.#, Hong, C., Varghese, P., Zakir, K., Gronsbell, J. (2023). Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. Journal of the American Medical Informatics Association. [Paper][R-package]
Gronsbell, J., Panickan, V. A., Lin, C., Charlon, T., Hong, C., Zhou, D., Wang, L., Gao, J., Zhou, S., Tian, Y., Shi, Y., Gan, Z., Cai, T. PEHRT: A Common Pipeline for Harmonizing Electronic Health Record data for Translational Research. [arXiv][Tutorial]
Gao, J., Chou, B., McCaw Z.R., Thurston, H., Varghese, P., Hong, C., Gronsbell, J. (2025). What is fair? Defining fairness in machine learning for health. Statistics in Medicine. [Paper]
Smith, B., Gao, J., Chou, B., Gronsbell, J. fairmetrics: An R package for group fairness evaluation. Journal of Open Source Software. [Paper][R-package]
Gao, J., Espin-Garcia, O., Paterson, A., Sun, L. (2022). Integrating variant functional annotation scores have varied abilities to improve power of genome-wide association studies. Scientific Reports. [Paper][Code]
Gronsbell, J., Wang, J., Thurston, H., Gao, J., Shi, Y., Train, A.D., Butt, D., Gershon, A., O'Neill, B., Tu., K. (2025). Severe outcomes and length of stay among people with schizophrenia hospitalized for COVID-19: A population-based retrospective cohort study. Schizophrenia Bulletin. [Paper]
Choi, B.*, Gao, J.*, Haslhofer, R.*, Sigal, D*. (2022). Heat flow on time-dependent manifolds. Journal of Geometric Analysis. [Paper]
Ge, E., Gao, J., Wei, X., Ren, Z., Wei, J., Liu, X., Wang, X., Zhong, J., Lu, J., Tian, X., Fei, F., Chen, B., Wang, X., Peng Y., Luo, M., Lei, J. (2021). Effect modification of greenness on PM2.5 associated with all-cause mortality in a multidrug resistant tuberculosis cohort. Thorax. [Paper]
Ge, E., Gao, J., Ren, Z., Liu, X., Luo, M., Zhong, J., Fei F., Chen, B., Wang, X., Wei, X., Peng, Y. (2021). Greenness exposure and all-cause mortality during multi-drug resistant tuberculosis treatment: a population-based cohort study. Science of the Total Environment. [Paper]
Gao, J., Panjwani, A., Li, M., Espin-Garcia, O. Latent class growth analysis for ordinal response data in the Distress Assessment and Response Tool: an evaluation of state-of-the-art implementations. [arXiv][Code]
#: The authors contributed equally to this work.
*: Alphabetical Order