I develop statistical foundations for causal inference in complex biological systems, with the goal of enabling intervention-level understanding in genomics and biomedical science.
I develop the conceptual and methodological framework of causal genomics, integrating causal inference with statistical genetics to identify and quantify causal structure in complex genomic systems.
This includes:
Genome-wide causal interaction
Multi-omic mediation analysis
Causal target identification for intervention
Mendelian randomization for multi-omic data
My work focuses on identification and inference in complex settings, including:
Semiparametric theory
Causal mediation analysis
Latent/hidden outcomes
Inference under weak identification
These contributions address fundamental challenges in moving from association to causation.
I develop robust Mendelian randomization methods for high-dimensional and heterogeneous genomic data, including:
Pleiotropy and weak (invalid) instruments
Inference of complex biological systems
Recent work integrates MR with protein 3D structure prediction (e.g., AlphaFold).
I study non-standard inference problems arising in modern data regimes, including:
Composite null testing in which the classical Central Limit Theorem (CLT) fails
Non-standard asymptotics with weak identification
I also develop the statistical foundations of deep learning and its integration with scientific discovery in genomics and biomedicine.
Optimal training time of deep neural networks
Deep learning for semiparametric estimation and inference
Deep learning for causal inference
AI for proteomics data