How Much Should We Trust Observational Estimates? Accumulating Evidence Using Randomised Controlled Trials with Imperfect Compliance (2025),
with David Rhys Bernard, Gharad Bryan, Sylvain Chabé-Ferret, Jon de Quidt and Roland Rathelot.
[expand for abstract] [latest version]
Associated CEDIL Research Project Paper 9. Available at DOI: https://doi.org/10.51744/CRPP9.
Financial support by IPA and CEDIL.
The use of observational methods remains common in program evaluation. How much should we trust these studies, which lack clear identifying variation? We propose adjusting confidence intervals to incorporate the uncertainty due to observational bias. Using data from 53 development RCTs with imperfect compliance (ICRCTs), we estimate the parameters required to construct our confidence intervals, and illustrate their use. Our confidence intervals allow observational estimates to be used even when there are doubts about identification, have close to nominal coverage, lead to power gains in meta-analysis, and enable researchers to choose between RCT and observational methods based on power. A key takeaway of our findings is that observational methods have significantly lower power than suggested by conventional confidence intervals.
Large Sample Inference for a Class of Estimators Based on Unconfoundedness (2026)
[expand for abstract] [latest version]
Matching-type methods estimate causal treatment effects under unconfoundedness. This paper proposes an inference procedure for a broad class of estimators that impute the missing counterfactual outcomes as weighted sum of outcomes from the opposite treatment group, encompassing the vast majority of matching-type estimators. I propose a marginal variance estimator for the population average treatment effect and the population average treatment effect on the treated. Monte Carlo simulations for local linear kernel matching show accurate standard errors and coverage, matching or outperforming the naive bootstrap. Two empirical applications illustrate the procedure’s practical performance.
A Metadata Schema for Data from Experiments in the Social Sciences (2023),
with Jack Cavanagh, Sarah Kopper and Anja Sautmann.
World Bank Policy Research Working Paper WPS10296.
Supported by J-PAL Global.
[expand for abstract] [latest version] [GitHub repository with latest version and related materials.]
The use of randomized controlled trials (RCTs) in the social sciences has greatly expanded, resulting in newly abundant, high-quality data that can be reused to perform methods research in program evaluation, to systematize evidence for policymakers, and for replication and training purposes. However, potential users of RCT data often face significant barriers to discovery and reuse. We propose a metadata schema that standardizes RCT data documentation and can serve as the basis for one - or many, interoperable - data catalogs that make such data easily findable, searchable, and comparable, and thus more readily reusable for secondary research. The schema is designed to document the unique properties of RCT data. Its set of fields and associated encoding schemes (acceptable formats and values) can be used to describe any dataset associated with a social science RCT. We also make recommendations for implementing a catalog or database based on this metadata schema.
Can Observational Methods Reproduce the Results of Randomized Controlled Trials? A Meta-analysis (2026), with Sylvain Chabé-Ferret.
[expand for abstract]
Despite the continuous practical relevance of observational methods, we know very little about the properties of their bias. How large is it on average? Does it go to zero under some conditions? In this paper, we gather a large set of estimates of observational bias to answer these questions. Our estimates come from the litterature of within-study comparisons, spearheaded by LaLonde (1986), which estimate the bias of observational methods by comparing them to an experimental benchmark. Striking regularities emerge: (i) observational methods are, on average, unbiased; (ii) covariates matter – using pre-treatment outcomes (either as a control variable or to form a DID estimator) is instrumental to the good performance of observational methods; (iii) observational methods are inconsistent: when sample size increases, there remains an incompressible level of uncertainty about observational bias, which, even if centered around zero, makes observational methods utterly imprecise. The standard deviation of this systematic component is of 0.22σ in our sample; (iv) classical confidence intervals, which do not account for this component, badly undercover the true effect, and vastly overestimate the real power of observational methods – our estimates imply that observational methods are unable to detect effects lower than 0.43σ, even with an infinite sample size; (v) Bernard et al. (2024)’s proposed correction to the observational confidence intervals restores their nominal level of coverage; (vi) Bernard et al. (2024)’s bias-and variance-corrected FGLS meta-analytic estimator enables an efficient combination of observational and experimental estimates which tightens combined estimates around the true experimental effect.
Demand-Side Interventions for Economic Integration of Refugees, with David Rhys Bernard, Yvonne Giesing, Jakob Hennig and Sekou Keita.
Funded by J-PAL's European Social Inclusion Initiative.
J-PAL Research Resources: Data Analysis, Maya Duru and Sarah Kopper,
with contributors Jack Cavanagh, Jasmin Claire Fliegner and Anja Sautmann.