15. Data Analysis
"Torture the data, and it will confess to anything." - Ronald Coase
Lesson Prerequisites
This lesson assumes that you are familiar with the theory and methods of Statistical Inference and Power Calculations. Basic knowledge of Stata syntax and writing code in .do files (Intro to Stata, Stata Best Practices) is also required.
0. Intro to the lesson
This lesson focuses on how to specify and run the correct analytical model to estimate causal effects.
1. Linear regression
Linear regression is the backbone of quantitative analysis when estimating causal effects.
2. Control variables
You need to be careful about which control variables you include in your model, even if you're running an RCT.
3. Clustered standard errors
If your evaluation design was clustered, then you must cluster your standard errors.
4. Panel data methods
The ANCOVA model is usually superior to fixed effects or first-differences when controlling for baseline values in two-period impact evaluations.
5. Strata fixed effects
If you stratified treatment assignment then you must include strata fixed effects in your regression model
6. Heterogeneous treatment effects
Estimating subgroup effects can help you know who is benefitting more or less from the intervention, but is limited in explaining mechanisms of impact.
7. Pre-analysis plans
A pre-analysis plan helps to prevent selective reporting of results, and makes your findings more credible.
Banner photo: John Snow's cholera map, which helped to trace the source of the cholera outbreak in London in 1854. Accessed from https://commons.wikimedia.org/wiki/File:Snow-cholera-map-1.jpg.