Computing confidence intervals with the binomial distribution

During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03,0.40 + 0.03).

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.

The procedure to find the confidence interval, the sample size, the error bound, and the confidence level for a proportion is similar to that for the population mean, but the formulas are different.


How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. (There is no mention of a mean or average.) If X is a binomial random variable, then X ~ B(n, p) where nis the number of trials and p is the probability of a success. To form a proportion, take X, the random variable for the number of successes and divide it by n, the number of trials (or the sample size). The random variable P′(read “P prime”) is that proportion,

P' = x/n

Sometimes the random variable is denoted as P^, read “P hat”.)

When n is large and p is not close to zero or one, we can use the normal distribution to approximate the binomial.

XN(np, npq)

f we divide the random variable, the mean, and the standard deviation by

n, we get a normal distribution of proportions with P′, called the estimated proportion, as the random variable. (Recall that a proportion as the number of successes divided by n.)

The confidence interval has the form (p′EBP, p′ + EBP). EBP is error bound for the proportion.

In the error bound formula, the sample proportions p′ and q′ are estimates of the unknown population proportions p and q. The estimated proportionsp′ and q′ are used because p and q are not known. The sample proportions p′ and q′ are calculated from the data: p′ is the estimated proportion of successes, and q′ is the estimated proportion of failures.

The confidence interval can be used only if the number of successes np′ and the number of failures nq′ are both greater than five.

__________________________________________

EXAMPLE 1

Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes – they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.

  • The first solution is step-by-step (Solution A).

  • The second solution uses a function of the TI-83, 83+ or 84 calculators (Solution B).

Solution A:

Let X = the number of people in the sample who have cell phones. X is binomial.

X ~ B(500, 421/500)

To calculate the confidence interval, you must find p′, q′, andEBP.

n = 500

x = the number of successes = 421

p’= x/n = 421/500 = 0.842


p′ = 0.842 is the sample proportion; this is the point estimate of the population proportion.

q′ = 1 – p′ = 1 – 0.842 = 0.158

Since CL = 0.95, then α = 1 – CL = 1 – 0.95 = 0.05 (α) = 0.025.

Then

zα/2 = z0.025 = 1.96


Use the TI-83, 83+, or 84+ calculator command invNorm(0.975,0,1) to find z0.025. Remember that the area to the right of z0.025 is 0.025 and the area to the left of z0.025 is 0.975. This can also be found using appropriate commands on other calculators, using a computer, or using a Standard Normal probability table.


EBP = (Za/2)(sqrt((p'q')/n) = (1.96) (sqrt((0.842*0.158)/500)) = 0.032


p‘−EBP=0.842−0.032=0.81

p′+EBP=0.842+0.032=0.874

The confidence interval for the true binomial population proportion is ( p′EBP, p′ + EBP) = (0.810, 0.874).

Interpretation

We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.

Explanation of 95% Confidence Level

Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.

Solution B:

Press STAT and arrow over to TESTS.

Arrow down to A:1-PropZint. Press ENTER.Arrow down to and enter 421.Arrow down to and enter 500.Arrow down to C-Level and enter .95.Arrow down to Calculate and press ENTER.The confidence interval is (0.81003, 0.87397).

__________________________________________


“Plus Four” Confidence Interval for p

There is a certain amount of error introduced into the process of calculating a confidence interval for a proportion. Because we do not know the true proportion for the population, we are forced to use point estimates to calculate the appropriate standard deviation of the sampling distribution. Studies have shown that the resulting estimation of the standard deviation can be flawed.

Fortunately, there is a simple adjustment that allows us to produce more accurate confidence intervals. We simply pretend that we have four additional observations. Two of these observations are successes and two are failures. The new sample size, then, is n + 4, and the new count of successes is x + 2.

Computer studies have demonstrated the effectiveness of this method. It should be used when the confidence level desired is at least 90% and the sample size is at least ten.

__________________________________________

EXAMPLE 2

A random sample of 25 statistics students was asked: “Have you smoked a cigarette in the past week?” Six students reported smoking within the past week. Use the “plus-four” method to find a 95% confidence interval for the true proportion of statistics students who smoke.

Solution A:

Six students out of 25 reported smoking within the past week, so x = 6 and n = 25. Because we are using the “plus-four” method, we will use x = 6 + 2 = 8 and n = 25 + 4 = 29.

p' = x/n = 8/29 = 0.276

q' = 1-p' = 1-0.276 = 0.724

Since CL = 0.95, we know z0.025=1.96


We are 95% confident that the true proportion of all statistics students who smoke cigarettes is between 0.113 and 0.439.

Solution B:

Press STAT and arrow over to TESTS.

Arrow down to A:1-PropZint. Press ENTER.

Remember that the plus-four method assume an additional four trials: two successes and two failures. You do not need to change the process for calculating the confidence interval; simply update the values of x and n to reflect these additional trials.

Arrow down to x and enter eight.

Arrow down to n and enter 29.

Arrow down to C-Level and enter 0.95.

Arrow down to Calculate and press ENTER.

The confidence interval is (0.113, 0.439).

__________________________________________


Conclusion

The confidence interval for the larger sample is narrower than the interval from Example 6. Larger samples will always yield more precise confidence intervals than smaller samples. The “plus four” method has a greater impact on the smaller sample. It shifts the point estimate from 0.26 (13/50) to 0.278 (15/54). It has a smaller impact on the EPB, changing it from 0.102 to 0.100. In the larger sample, the point estimate undergoes a smaller shift: from 0.270 (159/588) to 0.272 (161/592). It is easy to see that the plus-four method has the greatest impact on smaller samples.

Calculating the Sample Size n

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size.

The error bound formula for a population proportion is EBP = (Za/2)(sqrt((p'q')/n))

Solving for n gives you an equation for the sample size.

Concept Review

Some statistical measures, like many survey questions, measure qualitative rather than quantitative data. In this case, the population parameter being estimated is a proportion. It is possible to create a confidence interval for the true population proportion following procedures similar to those used in creating confidence intervals for population means. The formulas are slightly different, but they follow the same reasoning.

Let p′ represent the sample proportion, x/n, where x represents the number of successes and n represents the sample size. Let q′ = 1 – p′. Then the confidence interval for a population proportion is given by the following formula:

(lower bound, upper bound)

The “plus four” method for calculating confidence intervals is an attempt to balance the error introduced by using estimates of the population proportion when calculating the standard deviation of the sampling distribution. Simply imagine four additional trials in the study; two are successes and two are failures. Calculate , and proceed to find the confidence interval. When sample sizes are small, this method has been demonstrated to provide more accurate confidence intervals than the standard formula used for larger samples.

Formula Review

p′ = x / n where x represents the number of successes and n represents the sample size. The variable p′ is the sample proportion and serves as the point estimate for the true population proportion.

q′ = 1 – p′

The variable p′ has a binomial distribution that can be approximated with the normal distribution shown here.


s is a point estimate for σ

References:

  1. https://courses.lumenlearning.com/introstats1/chapter/a-population-proportion/

CC LICENSED CONTENT, SHARED PREVIOUSLY

ALL RIGHTS RESERVED CONTENT

  • Confidence Intervals for Population Proportions. Authored by: StatisticsLectures.com. Located at: https://youtu.be/3ReWri_jh3M. License: All Rights Reserved. License Terms: Standard YouTube License