About Me
I am a Postdoctoral Research Fellow in Biostatistics at Harvard University, working with Prof. Tianxi Cai. I graduated with a Ph.D. in Statistics at the University of California, Davis (UC Davis) in August 2022, where I was fortunate to be advised by Prof. Hao Chen. Before that, I received my B.S. in Statistics and B.E. in Computer Science (dual) from the University of Science and Technology of China (USTC) in 2019.
I am interested in developing statistical methodology and theory for electronic health records (EHR) data analysis. I'm also developing practical tools for analyzing high-dimensional and non-Euclidean data. Specifically, I am interested in reinforcement learning, transfer learning, high-dimensional statistics, network analysis, and graph- and rank-based methods.
Please see my new homepage at https://doudouzhou.github.io/.
Preprints/Publications
Statistical Method
Contrastive Learning on Multimodal Analysis of Electronic Health Records. [arXiv]
Cai, T., Huang, F., Nakada, R., Zhang, L., Zhou, D. (alphabetic order)
Preprint, 2024.
The Wreaths of Coherence: Uniform Graph Feature Selection with False Discovery Rate Control. [arXiv]
Liang, J.*, Liu, Y.*, Zhou, D., Zhang, S., Lu, J.
Preprint, 2024.
Inference of Dependency Knowledge Graph for Electronic Health Records. [arXiv]
Xu, Z., Gan, Z., Zhou, D., Shen, S., Lu, J., Cai, T.
Preprint, 2023.
Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model. [arXiv]
Cai, T., Xia, D., Zhang, L., Zhou, D. (alphabetic order)
Under revision, 2023.
RING-CPD: Asymptotic Distribution-free Change-point Detection for Multivariate and Non-Euclidean Data. [arXiv]
Zhou, D., Chen, H.
Under revision, 2022.
Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features. [arXiv]
Zhou, D.*, Liu, M.*, Li, M., Cai, T. (*: contributed equally)
Journal of the American Statistical Association: Theory and Methods, 2024.
Federated Offline Reinforcement Learning. [arXiv][paper] [code]
Zhou, D.*, Zhang, Y.*, Sonabend-W, A., Wang, Z., Lu, J., Cai, T. (*: contributed equally)
Journal of the American Statistical Association: Theory and Methods, 2024.
Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices. [link] [code]
Zhou, D., Cai, T., Lu, J.
Journal of Machine Learning Research, 2023. (2022 Best Student Paper Award from ASA Statistical Learning and Data Science.)
A New ranking Scheme for Modern Data and Its Application to Two-sample Hypothesis Testing. [arXiv, poster]
Zhou, D., Chen, H.
Conference on Learning Theory (COLT), 2023. (2022 ICSA Student Poster Award.)
Double/Debiased Machine Learning for Logistic Partially Linear Model. [link]
Liu, M., Zhang, Y., Zhou, D. (alphabetic order)
The Econometrics Journal, 2021
Statistical Application
DOME: Directional Medical Embedding Vectors from Electronic Health Records.
Wen, J., et al.
Submitted, 2024.
ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis. [medRxiv] [code]
Gan, Z.*, Zhou, D.*, et al.
Submitted, 2023.
SONAR: Enabling Robust Automated Harmonization of Heterogeneous Data through Ensemble Machine Learning. [preprint]
Yang D, Zhou, D., et al.
Submitted, 2023.
Knowledge-Driven Online Multimodal Automated Phenotyping System. [medRxiv]
Xiong, X., et al.
Submitted, 2023.
Two-stream Feature Extraction for Self-supervised Image Quality Assessment. [paper]
Lou, Y., Chen, Y., Huang, Y., Zhou, D., Cao, Y., Wang, H.
IEEE International Conference on Data Mining (ICDM), 2023.
Hierarchical Pretraining for Biomedical Term Embeddings. [link]
Cai, B., Zeng, S., Lin, Y., Yuan, Z., Zhou, D., Tian, L.
Proceedings of the 18th Conference on Computational Intelligence Methods for Bioinformatics & Biostatistics (CIBB 2023).
Multimodal Representation Learning for Predicting Molecule-Disease Relations. [link]
Wen, J., et. al.
Bioinformatics, 2023.
Multiview Incomplete Knowledge Graph Integration with Application to Cross-institutional EHR Data Harmonization. [link]
Zhou, D., et al.
Journal of Biomedical Informatics, 2022.
Semi-Supervised Calibration of Risk with Noisy Event Times (SCORNET) Using Electronic Health Record Data. [link]
Ahuja, Y., Liang, L., Zhou, D., Huang, S., Cai, T.
Biostatistics, 2022.
Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data. [link]
Hong, C., et al.
NPJ digital medicine, 2021.
sureLDA: A Multidisease Automated Phenotyping Method for the Electronic Health Record. [link]
Ahuja, Y., Zhou, D., He, Z., Sun, J., Castro, V., Gainer, V., Murphy, S., Hong, C., Cai, T.
Journal of the American Medical Informatics Association, Volume 27, Issue 8, August 2020.
Experience
Visiting Graduate Student, Advisor: Prof. Tianxi Cai and Junwei Lu.
Department of Biomedical Informatics, Harvard University (June 2021-- Sep. 2021).
Visiting Graduate Student, Advisor: Prof. Tianxi Cai and Junwei Lu.
Department of Biomedical Informatics, Harvard University (June 2020 -- Sep. 2020).
Research Assistant, Advisor: Prof. Yong Chen.
Department of Biostatistics, the University of Pennsylvania (Aug. 2018 -- Oct. 2018).