Qi Zhang

Research Summary

I study causal inference and machine learning to advance pharmacoepidemiology, enable precision medicine, and drive innovation in healthcare.

Methods interest

Heterogeneity of causal effects
Synthetic data generation
Unmeasured confounding
Complex longitudinal data
Natural Language processing
Probabilistic bias analysis

Applied areas

Comparative treatment safety and effectiveness
Oncology treatment
Hormone therapy on transgender population
Precision medicine

Selected work

A validated NLP approach to identify transgender people and their natal sex in electronic health records

Developed a two-stage natural language processing (NLP) pipeline for automated ascertainment of a large de-identified transgender and gender diverse (TGD) cohort using keyword-containing free text from electronic health records (EHR) across multiple Kaiser Permanente institutions. The first stage identified transgender identity, and the second determined natal sex to distinguish transfeminine and transmasculine individuals. NLP models provide an efficient, scalable approach for TGD identification, with transformer-based models and support vector machines achieving the best performance in the respective stages. Models were validated using an adaptive stratified design, demonstrating strong performance and potential transportability to external populations.

papers under review | stage 1-repo | stage 2-repo coming soon | stage 1-poster | stage 2-poster

2. CausalMix: Synthetic data for causal inference methods benchmark

Developed CausalMix, a synthetic data generator, combining conditional VAEs and Bayesian Mixture Models to simulate samples with potential outcomes under predefined conditional average treatment effects and unmeasured confounding. Implemented in PyTorch Lightning, the framework was evaluated using SDMetrics and Wasserstein distance to assess distributional similarity. The project highlights CausalMix’s potential for validating causal inference models and enabling bias-aware data augmentation to improve conditional treatment effect estimation.

paper | repo

3. Bayesian priors in probabilistic bias analysis

Designed a simulation study to improve methods for adjusting exposure misclassification in epidemiologic research. Compared five approaches assigning prior distributions to the parameters of positive and negative predictive values derived from small validation studies. Results showed that applying a uniform prior beta significantly improved validity when validation data were sparse, while all methods performed similarly with sufficient data. This work highlights practical strategies for enhancing the validity of bias-adjusted measures in real-world studies with limited sample sizes.

paper | repo | slides

4. A comparison of causal forests and the DR-Learner for estimating conditional average treatment effects

Conducted a simulation study to evaluate two widely used methods for estimating conditional average treatment effects (CATEs): the double-robust (DR) learner and the causal forest algorithm. Both methods achieved strong confidence interval coverage overall, but the DR learner outperformed causal forests in scenarios with strong treatment effects and low heterogeneity. The study also revealed that identifying effect modifiers remains challenging, especially in smaller samples. These findings provide practical insights to guide method selection for estimating CATEs in empirical research.

paper coming soon | repo | poster | slides

5. Effect modifiers in comparative treatment safety among patients with advanced prostate cancer

Estimated conditional average treatment effects (CATEs) when comparing two novel hormone therapies—enzalutamide and abiraterone—on hospitalization risk from treatment-related adverse events among patients with metastatic castration-resistant prostate cancer (mCRPC). Applied proximal causal inference with double negative controls to adjust for potential unmeasured confounding. Used a data-driven approach to detect effect modification across combinations of patient characteristics. This study provides new insights into treatment effect heterogeneity, supporting more personalized treatment decisions for patients with mCRPC.

paper and repo coming soon

Page updated

Google Sites

Report abuse