15. Data Analysis

"Torture the data, and it will confess to anything." - Ronald Coase

Lesson Prerequisites

This lesson assumes that you are familiar with the theory and methods of Statistical Inference and Power Calculations. Basic knowledge of Stata syntax and writing code in .do files (Intro to Stata, Stata Best Practices) is also required.

0. Intro to the lesson

This lesson focuses on how to specify and run the correct analytical model to estimate causal effects.

1. Linear regression

Linear regression is the backbone of quantitative analysis when estimating causal effects.

2. Control variables

You need to be careful about which control variables you include in your model, even if you're running an RCT.

3. Clustered standard errors

If your evaluation design was clustered, then you must cluster your standard errors.

4. Panel data methods

The ANCOVA model is usually superior to fixed effects or first-differences when controlling for baseline values in two-period impact evaluations.

5. Strata fixed effects

If you stratified treatment assignment then you must include strata fixed effects in your regression model

6. Heterogeneous treatment effects

Estimating subgroup effects can help you know who is benefitting more or less from the intervention, but is limited in explaining mechanisms of impact.

7. Pre-analysis plans

A pre-analysis plan helps to prevent selective reporting of results, and makes your findings more credible.

Additional Resources

  • IDinsight Impact Evaluation Design and Pre Analysis Plan Template (link)

  • IDinsight Technical Report Template (link)

  • IDinsight Technical Report Checklist (link)

  • DesignDeclare blog on when to use a diff-in-diff or ANCOVA model (link)

Banner photo: John Snow's cholera map, which helped to trace the source of the cholera outbreak in London in 1854. Accessed from https://commons.wikimedia.org/wiki/File:Snow-cholera-map-1.jpg.