zJunk‎ > ‎

3. Normal Confidence Intervals

Learning objectives (and summaries)

Understand why all confidence intervals are centered around a statistic and extended with a margin of error based on confidence level and the spread of the sampling distribution (the standard error).
      • Understand and explain the core components of the confidence interval formula.
        • [parameter] = [statistic] ± [critical value z*]⋅[standard error].
        • In words: estimate the parameter by starting with the statistic (estimate from a sample) and adding a margin of error above and below (determined by the sampling distribution and the level of confidence needed).
      • Recognize that the standard error (SE) is based on the sampling distribution
        • For means, this comes from the Central Limit Theorem.  SE = sx / √n.  Larger sample size reduces the SE.
        • For proportions, this comes from an approximation to the binomial theorem which is the theoretical sampling distribution for a proportion.  Use the formula SE = √(p̂⋅(1-p̂)/n).  Again, larger sample size reduces SE.
      • Only perform statistical inference (including confidence intervals) when the appropriate conditions are met.  Note that all conditions are guidelines, not strict rules.
        • The sample was taken randomly from the population (SRS, stratified, systematic, possibly random cluster).
        • The population is at least 10 times larger than the sample (otherwise you get weird problems with dependent probabilities and the sampling distribution isn't normal shaped).
        • Means only: the sample size is at least 30.  Otherwise, you may not have a normal distribution.
        • Proportions only: there are at least 10 successes and 10 failures in the sample.  With fewer, the distribution ends up fairly skewed.
      • Estimate sample sizes needed to be within a given margin of error.  Recognize that these are estimates.
        • Means only: n ≥ (z*⋅sx/ME)2.  Always round up to whole number.
        • Proportions only: n ≥ (z*/ME)2(p̂)(1-p̂).  Always round up to whole number.

      Videos if you missed lecture

        Videos for everyone



        Essay Questions
              What is the purpose of all of the assumptions you need to check before calculating a confidence interval from the formula?


              Show how to derive the formula that calculates minimum sample size for quantitative data from the margin of error part of the confidence interval for means.


              Compare and contrast the sampling distributions created by StatKey and created by the normal-curve-based formula.


              Practice problems
                  1. What is the generic formula for all confidence intervals?
                  2. Where does the statistic come from?
                  3. What is the margin of error made up of?
                  4. Where does the standard error come from?
                  5. Where does z* come from?
                  6. For confidence intervals of means, explain why the assumption about sample size is related to the Central Limit Theorem.
                  7. Confidence intervals for proportions have a condition that there are at least 10 successes and 10 failures in the sample.  How is this similar to the condition of a minimum sample size for means?
                  8. The condition we practiced since the first unit is having good, random data.  Why is this SO important?
                  9. Plug & chug practice: find a 95% confidence interval for a mean with sample results  = 750, sx = 87, and n=36.
                  10. Plug & chug practice: find a 99% confidence interval for a proportion with sample results 80/98.
                  For #11-14, complete the following steps:
                  a. Is this a confidence interval of a mean or proportion?
                  b. What are the assumptions you need to check before performing this interval?  Does it meet the assumptions?
                  c. What is the best single guess (point estimate) of the mean/proportion?
                  d. What is the standard error of the underlying sampling distribution?
                  e. What level of confidence are you calculating an interval at?  What is z*?
                  f. Calculate the confidence interval in ± form.  Use a subscript on μ or p to more clearly show what it refers to.

                  11) A Labrador Retriever typically has a litter of 5-10 puppies, but the exact amount varies by the individual.  One pet enthusiast tracked the litters of an SRS of 45 mothers.  (Note that in order to obtain the 45, she had to ask 59 owners.)  She found the average number per litter to be 7.8 puppies with a standard deviation of 2.88 puppies.  Find a 95% confidence interval.

                  12) A study of randomly selected trees in a local park found that 13 of 80 trees had been infected with the Canker rot fungi.  Find a 90% interval for the proportion of trees infected across the entire park.

                  13) The city council polled a SRS of registered voters in the region about their support of a school referendum.  Of the 200 residents who were sent a survey, 124 responded.  Of these, 76 supported the referendum.  Find a 95% confidence interval for the amount of community support.

                  14) The St. Mary’s Emergency Department sees a variable number of patients each day.  The hospital staff kept records for 61 straight days to track the number of patients.  In their sample they found  = 120 and sx = 19.  Find a 99% confidence interval for the average number of patients.

                  For 15-18, find the minimum sample size necessary to get a margin of error less than or equal to the specified amount.
                  15) The standard deviation of pages in a textbook is about 284 pages.  You want to estimate the mean number of pages in an entire bookstore’s textbook area.  You want a 90% confidence level with a margin of error of only 10 pages.
                  16) A research team wants to find a 95% confidence interval for their proportion with a margin of error of less than 5%.
                  17) Bob wants a 95% confidence interval on the proportion of people who prefer ice cream over cookies in his town.  His preliminary results suggest it is around 70% who prefer ice cream.  In order to meet the newspaper’s standards, he needs to report a margin of error of less than 3%.
                  18) The standard deviation of song lengths is around 0.79 minutes.  You want to estimate the mean length of songs on a given radio station at 99% confidence within 6 seconds (0.1 minutes).  Be careful with units.

                  Practice solutions
                      1. [parameter] = [statistic] ± [critical value z*] * [standard error]
                      2. It is the center of the sampling distribution (the single most likely guess)
                      3. The standard error and z* multiplied together
                      4. It is the spread (specifically the standard deviation) of the sampling distribution.  For means, its formula is part of the Central Limit Theorem.
                      5. It is how many standard deviations you want to go out each direction in your confidence interval.  The actual number corresponds to how many standard deviations you need to go in each direction from the middle of the normal curve to get a certain percentage of area, such as 95%, shaded in the curve.
                      6. You need to assume that you have a sample larger than 30.  If you think about the Central Limit Theorem, nearly all distributions will have a normal-shaped sampling distribution when you take samples of at least 30.  When you have smaller samples, you have to check your starting distribution more carefully.  (For those of you considering AP, there is actually another set of normal-ish distributions designed for well-behaved small samples called the t-distributions, so even if you did have a normal-ish starting population, you still can't necessarily just use the standard normal curve for your confidence intervals.  Crazy!)
                      7. Like the last one, the main point of this condition is that you get a truly normal-curve shaped sampling distribution.  When you have a small number of successes or failures, your sampling distribution gets bumped up to the edge of 0 or 100% and becomes more skewed.
                      8. The confidence interval formula was designed from probability theory, which assumes that you're selecting individuals at random from the population.  If you're not actually doing that, then the formula won't give you useful results.  It would be like a friend telling you that putting ketchup on your hot dog makes it taste better, but you decide to buy ice cream instead of a hot dog.  Then you put ketchup on the ice cream and you're surprised it didn't make it taste better.  Ketchup is designed for hot dogs, confidence interval formulas are designed for randomly selected data.  If this analogy is awful, please email me with a better one and I will replace it.
                      9. μ = 750 ± (1.96 ⋅ 87/√36) = 750 ± 28.42
                      10. p = 80/98 ± (2.576 ⋅ √((80/98)(1-80/98)/98) ) = .816 ± .101
                      11. a) mean
                        b) SRS (yes)
                            n less than 1/10 population (yes, there are thousands of litters of puppies)
                            n≥30 (yes, 45)
                        c) 
                        x̄ = 7.8 (the best single guess)
                        d) SE = sx/√n = 2.88 
                        / √45 = 0.429
                        e) 95% --> z* = 1.96
                        f) 
                        μpuppiesx̄ ± z*⋅SE = 7.8 ± 1.96⋅0.429 = 7.8 ± 0.841
                      12. a) proportion
                        b)
                         SRS (yes, says random)
                            n less than 1/10 population (probably, a park likely has hundreds of trees)
                            10+ successes and fails (yes, 13 and 67)
                        c) p̂ = 13/80 = 0.163 (the best single guess)
                        d) SE = 
                        √( (13/80)⋅(1-13/80) / 80 ) = 0.041
                        e) 90% --> z* = 1.645
                        f) 
                        pdiseased = p̂ ± z*⋅SE = 0.163 ± 1.645⋅0.041 = 0.163 ± 0.067
                      13. a) proportion
                        b)
                         SRS (some non-response, but seems reasonable still)
                            n less than 1/10 population (probably, assuming at least 1240 residents in town)
                            10+ successes and fails (yes, 76 and 48)
                        c) p̂ = 76/124 = 0.613 (the best single guess)
                        d) SE = 
                        √( (76/124)⋅(1-76/124) / 124 ) = 0.044
                        e) 95% --> z* = 1.96
                        f) 
                        psupporting = p̂ ± z*⋅SE = 0.613 ± 1.96⋅0.044 = 0.163 ± 0.086
                      14. a) mean
                        b) SRS (well, 61 consecutive days is a bit fishy -- proceed but know that there may be issues with your result)
                            n less than 1/10 population (yes, there are many days possible to choose from depending on how much time you have for the study)
                            n≥30 (yes, 61)
                        c) 
                        x̄ = 120 (the best single guess)
                        d) SE = sx/√n = 19 
                        / √61 = 2.433
                        e) 99% --> z* = 2.576
                        f) 
                        μpuppies = x̄ ± z*⋅SE = 120 ± 2.576⋅2.433 = 120 ± 6.267
                      15. n ≥ (z*⋅sx/ME)2  = (1.645⋅284/10)2 = 2182.572
                        n ≥ 2183 (remember to always round up since you are finding the minimum, or lowest possible, sample size)
                      16. n ≥ (z*/ME)2(p̂)(1-p̂) = (1.96/.05)2(0.5)(1-0.5) = 384.16 (note that I used p̂=0.5, which you should do whenever you don't have a prior estimate for it)
                        n ≥ 385 (again, round up)
                      17. n ≥ (z*/ME)2(p̂)(1-p̂) = (1.96/.03)2(0.7)(1-0.7) = 896.373 (note that I used p̂=0.7 because I had a preliminary estimate to go with in the problem)
                        n ≥ 897 (again, round up)
                      18. n ≥ (z*⋅sx/ME)2  = (2.576⋅0.79/0.1)2 = 414.139
                        n ≥ 415 (remember to always round up since you are finding the minimum, or lowest possible, sample size)
                      Practice quiz (we will go over answers in class as a group)

                      1) What is the purpose behind the condition "sample is 1/10 or less of the population size"?
                      2) Explain the components of the margin of error and why each is useful.
                      3) I want to know how many people I need to sample to get a margin of error below 3% on a political poll (at 95% confidence) before the election.  It is a close race.
                      4) I want to calculate the minimum sample size for a survey on gas mileage.  At 90% confidence I need to be within 1.0 mpg.  The typical standard deviation between data points is about 4.8 mpg.  How many data points do I need?

                      For #5-6, complete the following steps:
                      a. Is this a confidence interval of a mean or proportion?
                      b. What are the assumptions you need to check before performing this interval?  Does it meet the assumptions?
                      c. What is the best single guess (point estimate) of the mean/proportion?
                      d. What is the standard error of the underlying sampling distribution?
                      e. What level of confidence are you calculating an interval at?  What is z*?
                      f. Calculate the confidence interval in ± form.  Use a subscript on μ or p to more clearly show what it refers to.

                      5) A school survey asks an SRS of 50 students if they would like teachers to offer more hybrid classes.  42 students say yes.  Find a 95% confidence interval.

                      6) A high school tennis coach decided to study lengths of rallies in games with her top players.  Using a systematic sample of every 30th serve from games played over the past two weeks, she collected 126 rallies.  The average length was 3.2 hits (not including the serve) with a standard deviation of 1.9.  The distribution was not symmetrical.  Find a 90% confidence interval.


                      Notes


                      Comments