We focus on the problem of estimating the average treatment effect (ATE) in causal inference.
Over the past decades, a wide range of statistical methods have been developed to draw causal conclusions from either experimental or observational studies.
Experimental data, collected from randomized controlled trials (RCTs), offer high internal validity. However, such data can be costly to obtain.
Observational data are often cheaper, but their internal validity is suspect. Specifically, ATE estimates based on observational data, assuming unconfoundedness, may suffer from biases due to unobserved confounders.
To illustrate the basic ideas, consider a setting with no covariates.
We have an experimental sample where we observe both treated and control units, and an observational sample where we observe only control units, based on the widely used LaLonde data.
Because in this setting there is no question about estimating the average outcome for the treated, for which we only have the experimental data, the question is how to estimate the average control outcome for the experimental population. We have the following quantities:
We consider a weighted average of the average control outcome in the observational sample and the average of the control outcome in the experimental sample, with weights λ ∈ [0, 1] and 1-λ respectively:
What properties would we like λ to have?
If the experimental sample is large, then even if the bias in the observational sample is very small, as long as there is some bias we would like λ to be close to zero.
If on the other hand the bias in the observational sample is negligible, then we would like to choose λ close to one.
In other words, we would like to shrink our experimental estimate towards the observational data, but do so in a data-adaptive fashion, that is, with a data-driven λ.
In this simple no-covariate case where the focus is on the expected control outcome in the experimental population, we implement this objective by selecting λ through cross-validating on the experimental data:
where the subscripts {Bk, B-k} denote the complementary subsets in K-fold cross-validation. In the paper we extend this to the case with more general models for the observational data involving covariates.
Figure: Cross-Validated Causal Inference (CVCI) using λ. Top panels: selection of λ via the cross-validation objective CV(λ). The curve shows the average of CV(λ) over 5000 runs, and the blue dashed line shows the average selected λ. Bottom panels: ATE estimates for different λ.
(This corresponds to Figure 1 in the paper.)
In the above figure, we present some results for this example based on the LaLonde data. In the bottom two panels we present two sets of three estimates of the ATE.
In both panels, results based on the experimental data alone (corresponding to λ=0).
Again in both panels, results based on the observational data alone (corresponding to λ=1).
Both are intended to set the stage for our preferred results based on the cross-validated λ:
Our proposed method. The cross-validation is based on five fold splits, leading to a unique selected λ. We repeat this many times to get a distribution of selected λ.
Findings: In the case without covariates, we find that the selected λ is always close to or exactly equal to 0, corresponding to the experimental estimates. The cross-validation makes clear that the the data can tell us that the observational data are of little value in this case. For a covariate-adjusted version of the observational data estimator, the cross-validated λ is much closer to 1, with the average value for selected λ over many choices of five folds equal to 0.77. Here the data imply that the observational data are valuable. The combination of the two sets of results shows that in this case our proposed method can detect when the observational sample is valuable, and when it is not, in a fully data-driven way.
Figure: Illustration of Cross-Validated Causal Inference (CVCI) for the general case. A full model (denoted by θ) containing the causal parameter (denoted by β) is obtained by minimizing a weighted combination of experimental and observational losses. The weight (denoted by λ) is chosen through cross-validation on the causal parameter across experimental folds.