7. Sampling for Surveys

"Statistics in the hands of activists have power ." - Ela Bhatt

Lesson Prerequisites

This lesson assumes familiarity with all of the concepts covered in the lessons on Intro to Stata, Stata Best Practices, and Statistical Inference, especially hypothesis testing and doing hypothesis tests in Stata. If you have not taken statistics before, it is strongly recommended that you take the free Udacity courses in descriptive statistics and inferential statistics and complete the Statistical Inference lesson before attempting this lesson.

0. Intro to the lesson

The main objective of sampling design and sample size calculations is to maximize the precision of your survey estimates, subject to budgetary and design constraints.

1. Simple random sampling

Simple random sampling is a good default sampling design and provides a useful reference point to more complicated sampling designs.


2. Simple random sampling in Stata

There are multiple ways to take a random sample in Stata. Here we cover two of the most common approaches: A manual approach involving a random number variable, and a more direct approach involving the user-written command gsample.


3. Sample size calculations

Sample size calculations are most often used to determine the necessary sample size to achieve a desired margin of error.


4. Finite population correction

A finite population correction adjusts your standard error to account for the fact that your sample represents a non-negligible fraction of your population. In many cases you can ignore FPC. But when you sample a large fraction of the population, you want to account for FPC to increase the precision of your estimates.


5. Stratified random sampling

Stratification improves precision by reducing the chances of 'unlucky' samples. You almost always want to stratify if you have any data on individuals in your sampling frame in advance.


6. Stratified random sampling in Stata

Stratified random sampling in Stata is straightforward. But watch out for stratification during estimation: If individuals from different strata have different probabilities of being sampled, then you need to include sampling weights to recover unbiased estimates.

7. Clustered random sampling

Clustering is often necessary for logistical or cost reasons. However, clustering can greatly increase the sample size requirements, depending on the intracluster correlation of your key outcomes.


8. Clustered random sampling in Stata

Like stratification, clustering may or may not require sampling weights, depending on whether clusters are the same size and/or if you sample clusters with probability proportionate to size. Regardless of whether you need sampling weights, you always need to account for clustering when you estimate standard errors.


Banner photo: John Snow's cholera map, which helped to trace the source of the cholera outbreak in London in 1854. Accessed from https://commons.wikimedia.org/wiki/File:Snow-cholera-map-1.jpg.