About Me

I am a Postdoctoral Research Fellow in Biostatistics at Harvard University, working with Prof. Tianxi Cai. I graduated with a Ph.D. in Statistics at the University of California, Davis (UC Davis) in August 2022, where I was fortunate to be advised by Prof. Hao Chen. Before that, I received my B.S. in Statistics and B.E. in Computer Science (dual) from the University of Science and Technology of China (USTC) in 2019. 

I am interested in developing statistical methodology and theory for electronic health records (EHR) data analysis.  I'm also developing practical tools for analyzing high-dimensional and non-Euclidean data. Specifically, I am interested in reinforcement learning, transfer learning, high-dimensional statistics, network analysis, and graph- and rank-based methods.  

Please see my new homepage at https://doudouzhou.github.io/.


Statistical Method

Contrastive Learning on Multimodal Analysis of Electronic Health Records. [arXiv]

Cai, T., Huang, F., Nakada, R., Zhang, L., Zhou, D. (alphabetic order)

Preprint, 2024.

The Wreaths of Coherence: Uniform Graph Feature Selection with False Discovery Rate Control. [arXiv]

Liang, J.*, Liu, Y.*, Zhou, D., Zhang, S., Lu, J.

Preprint, 2024.

Inference of Dependency Knowledge Graph for Electronic Health Records. [arXiv]

Xu, Z., Gan, Z., Zhou, D., Shen, S., Lu, J., Cai, T.

Preprint, 2023.

Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model. [arXiv]

Cai, T., Xia, D., Zhang, L., Zhou, D. (alphabetic order)

Under revision, 2023.

RING-CPD: Asymptotic Distribution-free Change-point Detection for Multivariate and Non-Euclidean Data. [arXiv]

Zhou, D., Chen, H.

Under revision, 2022.

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features. [arXiv]

Zhou, D.*, Liu, M.*, Li, M., Cai, T. (*: contributed equally)

Journal of the American Statistical Association: Theory and Methods, 2024.

Federated Offline Reinforcement Learning. [arXiv][paper] [code]

Zhou, D.*, Zhang, Y.*, Sonabend-W, A., Wang, Z., Lu, J., Cai, T. (*: contributed equally)

Journal of the American Statistical Association: Theory and Methods, 2024.

Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices. [link] [code]

Zhou, D., Cai, T., Lu, J. 

Journal of Machine Learning Research, 2023. (2022 Best Student Paper Award from ASA Statistical Learning and Data Science.) 

A New ranking Scheme for Modern Data and Its Application to Two-sample Hypothesis Testing. [arXiv, poster]

Zhou, D., Chen, H.  

Conference on Learning Theory (COLT),  2023. (2022 ICSA Student Poster Award.)

Double/Debiased Machine Learning for Logistic Partially Linear Model. [link]

Liu, M., Zhang, Y., Zhou, D. (alphabetic order)

The Econometrics Journal, 2021

Statistical Application

DOME: Directional Medical Embedding Vectors from Electronic Health Records. 

Wen, J., et al.

Submitted, 2024.

ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis. [medRxiv] [code]

Gan, Z.*, Zhou, D.*, et al.

Submitted, 2023.

SONAR: Enabling Robust Automated Harmonization of Heterogeneous Data through Ensemble Machine Learning. [preprint]

Yang D, Zhou, D., et al.

Submitted, 2023.

Knowledge-Driven Online Multimodal Automated Phenotyping System. [medRxiv]

Xiong, X., et al. 

Submitted, 2023.

Two-stream Feature Extraction for Self-supervised Image Quality Assessment. [paper]

Lou, Y., Chen, Y., Huang, Y., Zhou, D., Cao, Y., Wang, H.   

IEEE International Conference on Data Mining (ICDM), 2023. 

Hierarchical Pretraining for Biomedical Term Embeddings. [link]

Cai, B., Zeng, S., Lin, Y., Yuan, Z., Zhou, D., Tian, L.   

Proceedings of the 18th Conference on Computational Intelligence Methods for Bioinformatics & Biostatistics (CIBB 2023). 

Multimodal Representation Learning for Predicting Molecule-Disease Relations. [link]

Wen, J., et. al.

Bioinformatics, 2023.

Multiview Incomplete Knowledge Graph Integration with Application to Cross-institutional EHR Data Harmonization. [link]

Zhou, D., et al.

Journal of Biomedical Informatics, 2022.

Semi-Supervised Calibration of Risk with Noisy Event Times (SCORNET) Using Electronic Health Record Data.  [link]

Ahuja, Y., Liang, L., Zhou, D., Huang, S., Cai, T. 

Biostatistics, 2022. 

Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data. [link]

Hong, C., et al. 

NPJ digital medicine, 2021. 

sureLDA: A Multidisease Automated Phenotyping Method for the Electronic Health Record. [link]

Ahuja, Y., Zhou, D., He, Z., Sun, J., Castro, V., Gainer, V., Murphy, S., Hong, C., Cai, T. 

Journal of the American Medical Informatics Association, Volume 27, Issue 8, August 2020. 


  Department of Biomedical Informatics, Harvard University (June 2021-- Sep. 2021). 

  Department of Biomedical Informatics, Harvard University (June 2020 -- Sep. 2020). 

  Department of Biostatistics, the University of Pennsylvania (Aug. 2018 -- Oct. 2018).