In this paper, we propose a method that infers the bias of observational methods using data from "intention to treat" Randomized Controlled Trials, in which self-selection occurs after random assignment of an offer or an encouragement to take the treatment. Contrary to earlier methods, our method does not require the collection of additional data on non-participants and does not suffer from the bias due to using different survey instruments for participants and non-participants. We also propose a new decomposition of the bias of observational methods between a component due to unobserved variables, that might decrease as we observe more covariates, and a component due to a failure of common support, that will not decrease even if we observe more covariates. We estimate our decomposition for seven recently published papers with data available online. Most of these papers are evaluations of development programs in education, finance and health. Our current set of results is rather dispiriting: for most programs, the bias after conditioning on the observed covariates present in the datasets is generally as large as the bias before conditioning on anything. Our results suggest that the covariates we observe in development programs are poor predictors of selection bias. Not everything is grim though: we also find that the component of the bias of observational methods due to a failure of common support is generally small. We do interpret these results as a call for more research on the types of confounders that matter for a given type of program and outcome.
Our main empirical result can be summarized using the graph below. The x-axis shows the bias of an observational method that does not adjust for any covariate, measured in units of the standard deviation of the outcome in the control group. The y-axis shows the bias after adjusting for all available covariates in the dataset. The very dispiriting result is that the observations are aligned on the 45° line, meaning that the bias of the observational method is as important after adjusting for the covariates than it was before. Stated otherwise, it means that we do not observed the critical confounders for the interventions that we study.
The graph below shows the bias B of a simple comparison between agents exposed to an educational intervention and agents that are not. The bias is decomposed in four terms: the first two (By and Bx) can be solved by adjusting for observed covariates, and thus do not affect observational methods. The last two components are due to either unobserved confounders (Bu) or failure to find unexposed individuals similar to the exposed ones (Btts). The second source of bias is more severe as it cannot be solved by observing more data. The sum of these last two sources of bias is the bias of the observational method of interest (Bne=Bu+Btts). In this particular application, the bias of the observational method is small.