"Torture the data, and it will confess to anything." - Ronald Coase
This lesson assumes that you are familiar with the theory and methods of Statistical Inference and Power Calculations. Basic knowledge of Stata syntax and writing code in .do files (Intro to Stata, Stata Best Practices) is also required.
This lesson focuses on how to specify and run the correct analytical model to estimate causal effects.
Linear regression is the backbone of quantitative analysis when estimating causal effects.
You need to be careful about which control variables you include in your model, even if you're running an RCT.
If your evaluation design was clustered, then you must cluster your standard errors.
The ANCOVA model is usually superior to fixed effects or first-differences when controlling for baseline values in two-period impact evaluations.
If you stratified treatment assignment then you must include strata fixed effects in your regression model
Estimating subgroup effects can help you know who is benefitting more or less from the intervention, but is limited in explaining mechanisms of impact.
A pre-analysis plan helps to prevent selective reporting of results, and makes your findings more credible.
Banner photo: John Snow's cholera map, which helped to trace the source of the cholera outbreak in London in 1854. Accessed from https://commons.wikimedia.org/wiki/File:Snow-cholera-map-1.jpg.