“The formulation of a problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination and marks real advances in science.”
— Albert Einstein & Léopold Infeld, The Evolution of Physics (1938).
I develop the conceptual and methodological framework of causal genomics, integrating causal inference with statistical genetics to identify and quantify causal structure in complex genomic systems.
This includes:
Genome-wide causal interaction
Multi-omic mediation analysis
Causal target identification for intervention
Mendelian randomization for multi-omic data
My work focuses on identification and inference in complex settings, including:
Semiparametric theory
Causal mediation analysis
Latent/hidden outcomes
Inference under weak identification
These contributions address fundamental challenges in moving from association to causation.
I develop robust Mendelian randomization methods for high-dimensional and heterogeneous genomic data, including:
Pleiotropy and weak (invalid) instruments
Inference of complex biological systems
Recent work integrates MR with protein 3D structure prediction (e.g., AlphaFold).
I study non-standard inference problems arising in modern data regimes, including:
Composite null testing in which the classical Central Limit Theorem (CLT) fails
Non-standard asymptotics with weak identification
I also develop the statistical foundations of deep learning and its integration with scientific discovery in genomics and biomedicine.
Optimal training time of deep neural networks
Deep learning for semiparametric estimation and inference
Deep learning for causal inference
AI for proteomics data