We develop new methods to integrate experimental and observational data in causal inference. While randomized controlled trials offer strong internal validity, they are often costly and therefore limited in sample size. Observational data, though cheaper and often with larger sample sizes, are prone to biases due to unmeasured confounders. To harness their complementary strengths, we propose a systematic framework that formulates causal estimation as an empirical risk minimization (ERM) problem. A full model containing the causal parameter is obtained by minimizing a weighted combination of experimental and observational losses—capturing the causal parameter's validity and the full model's fit, respectively. The weight is chosen through cross-validation on the causal parameter across experimental folds. Our experiments on real and synthetic data show the efficacy and reliability of our method. We also provide theoretical non-asymptotic error bounds.
Note: Co-author Licong Lin is on the academic job market this year. He works on theoretical machine learning and statistics--if you have suitable opportunities please contact him.
Figure: Cross-Validated Causal Inference (CVCI) using λ. Top panels: selection of λ via the cross-validation objective CV(λ). The curve shows the average of CV(λ) over 5000 runs, and the blue dashed line shows the average selected λ. Bottom panels: ATE estimates for different λ.
(This corresponds to Figure 1 in the paper.)
@article{yang25cross,
title={Cross-validated causal inference: a modern method to combine experimental and observational data},
author={Yang, Xuelin and Lin, Licong and Athey, Susan and Jordan, Michael I. and Imbens, Guido W.},
journal={arXiv preprint arXiv:2511.00727 },
year={2025}
}