Sampling - Margin of Error

if a variable X follows normal distribution, assuming mean = u and standard deviation = d.

The chance that X falls within u - d and u + d is about 68.2%.

The chance that X falls within u - 2d and u + 2d is about 95.4%.

The chance that X falls within u - 3d and u + 3d is about 99.7%.

Correspondingly, the area under the probability density curve between u - d and u + d is 68.2%, the area between u - 2d and u + 2d is about 95.4%, and the area between u - 3d and u + 3d is about 99.7%.

The chance / the corresponding area is called confidence level. We are ~95% confident that X falls within u - 2d and u + 2d.

When we estimate the percentage of something by sampling, the more samples we have, the smaller standard deviation is, so the area u - 2d and u + 2d is smaller and we have a better (narrower) prediction range given the same level of confidence.

The margin of error is a measure of how close the estimate is to the mean (true) value. It is usually a number of standard deviation depending on the required confidence level.

Assuming 95% confidence level, the margin of error would be 2 x standard deviation, which mean we are 95% confident that the estimate is within u - 2d and u + 2d.

Assuming 99% confidence level, the margin of error would be 3 x standard deviation.

Again, the smaller standard deviation is, the smaller margin of error will be. So we need more samples to reduce the margin of error.

The standard deviation approximates: sqrt(p*(1-p)/N)

where p is roughly the percentage, if it's unknown, set it to 50% because a percentage of 50% needs the biggest sample size to achieve a given margin of error. N is the number of samples we have taken, ie, sample size.

The margin of error = 2 * standard deviation at 95% confidence. Confidence interval is from u-2d to 2 +2d. We are 95% confident that the estimate value is within the actual value +/-  margin of error.

For example, we want to estimate the percentage of people like Coke in Australia. We interview 40 people, and 10 answer that they love Coke while the other 30 don't.

So we estimate the percentage is 10 / 40 = 25%.

The standard deviation for the estimate is sqrt(0.25 * (1-0.25) / 40) = 6.8%.  margin of error at 95% confidence = 2 * Standard deviation= 13.6%

So we are 95% confident that the actual percentage of people who like Coke is between 25% - 15% and 25% + 15%.

More precisely,

a margin error of one standard deviation gives a 68% confidence level.

a margin error of 1.96 (not 2) standard deviations gives a 95% confidence level.

a margin error of 2.58 standard deviations gives a 99% confidence level.