Here, I will explain causal inference in plain language.
Suppose a pharmaceutical company has developed a new drug but lacks sufficient finances to conduct a randomized controlled trial (RCT).
As a result, the company will estimate the treatment effect based on observational data from four patients.
Yi(1): Potential outcome if the patient received the new drug
Yi(0): Potential outcome if the patient received the existing drug
Wi: Treatment assignment (Wi=1: new drug, Wi=0: existing drug)
The goal is to estimate the average treatment effect (ATE): E[Yi(1)−Yi(0)].
Estimate of ATE: 2=(6−1+4−1)/2.
Thus, in this hypothetical example, it appears that the new drug is effective.
The issue here is that we observe either Yi(1) or Yi(0), but not both, for each patient.
Thus, the actual data used for the analysis will look as follows:
Now, how can we estimate ATE using the left table?
A naive thinking is that we may just estimate ATE by (7+5)/2 - (6+8)/2 = -1
Note that the ATE was estimated to be 2.
Thus, this naive estimation seems wrong!!!
What is wrong with the naive method?
The issue with the naive method is that we ignored a possible confounder (denoted as X) in the estimation of the ATE.
If there is a confounder, we must account for it properly to estimate the ATE.
If we do not, the estimation may be biased.
Overall, there are two ways to estimate the ATE properly.
The simplest way is to randomize the treatment assignment and conduct a randomized controlled trial. In this case, confounders will be eliminated. This approach requires minimal assumptions for causal inference and is straightforward, but we know that randomized controlled trials are very expensive.
The second way is to collect as many high-quality covariates as possible, perform the analysis by "conditioning on the covariates," and conduct causal analysis without randomizing the treatment assignment. This approach requires certain assumptions for causal inference.
Eventually, the causal effect measures the difference between two potential outcomes in parallel "what-if" scenarios—one occurring on Earth and the other on Mars—for each patient, allowing us to estimate the individual treatment effect. Although this is idealistic, it may not be feasible at the individual level (unless additional assumptions are made). Therefore, we are often more interested in the average treatment effect.
To estimate the ATE in observational studies, we must have a sufficient amount of data and make certain assumptions.
Eventually, we divide patients into two groups—the treatment group and the control group—and separately estimate E[Yi(1)] and E[Yi(0)] using the observed data (tuple (Y, W, X)).
Then, we subtract E[Yi(1)] - E[Yi(0)] to obtain ATE = E[Yi(1) - Yi(0)].
This forms the foundation of causal inference.
Read & See
Causal Inference Is Just Bayesian Decision Theory [Link]