S3 Sample Size for desired Confidence Interval

When we carry out a survey -- for example, to find out what percentage of the population prefers brains to beauty -- we need to choose a Sample Size -- how many people do we plan to interview? Each person incurs a cost, but also, larger sample sizes lead to greater accuracy: shorter confidence intervals. The CLT provides us with guidance on how large a sample size we need in order to get a desired level of accuracy.

QUESTION: I would like to carry out a survey of Brains-Versus-Beauty question, and at the end of the survey, I would like to have a confidence interval of &#177 5%. That is, I would like to have 95% confidence of getting within 5% of the true probability. To achieve this level of accuracy.

SOLUTION: We have 95% confidence -- corresponding to 19 chances out of 20 -- at 2 SE = 2 sqrt(p*(1-p))/sqrt(N). To answer the question, we need to set 2SE=5%, and then solve for N. One problem is that p is unknown. We can resolve this by taking the worst possible case for p, which happens to be p=50%. The function y=p(1-p) is a quadratic which takes value y=0 at p=0 and p=1, and rises to its maximum value exactly at the center of the interval [0,1]. This means that the largest possible standard error occurs when p=50%. Setting p to this value give an upper bound on the SE; for any other value of p, SE will be smaller. Taking p=50% we can solve the equation 0.05 = 2 SE, which now becomes 0.05=1/sqrt(N), so that sqrt(N)=1/0.05=20. Squaring both sides, we get N=400. With a sample size of 400, we will get a 95% confidence interval of 2SE which is at most 5% in length -- it may be smaller if the p is different from 50%.

QUESTION: What happens when we double the sample size?

ANSWER: The Standard Error is divided by SQRT(2)=1.4142. Here the important thing to note is that as sample size N goes up the standard goes down by a factor of 1/sqrt(N) -- sqrt(N) increases rather slowly in N, so cutting down the standard error is quite expensive in terms of sample size. Suppose I want to reduce the 5% interval to 2.5%. Reducing the size of the interval by half requires multiplying the sample size by 4 -- when we multiply by 4 then 1/sqrt(4N)=1/[2sqrt(N)] so the interval will be half of what it is for N. So this would take a sample size of 1600. Suppose I want a 95% confidence interval of size 1.25%, reducing 2.5% by one half. Then I have to multiply 1600 by 4 to get 6400. So to get to 1.25% accuracy from 5%, I have to increase sample size from 400 to 6400. It is rather expensive to buy extra accuracy.