### 3. Confidence Intervals

Task: Infographic survey results
Use the skills from this section to write a confidence interval sentence, in +/- form, for each question you asked:
• [2 pts] Use the proportion from your data to calculate the 95% confidence interval for every question
• [2 pts] Convert each interval into +/- form and round to the nearest tenth of a percent
• [2 pts] Copy, then modify each of the sentences from the survey task into confidence interval sentences using the 95% confidence level and the margin of error above
Mastery Quiz Prep

#### Dr. Nic's Introduction to inference

Calculating and explaining confidence intervals:

Since we are only covering proportions (and not means), you can stop watching the last video at 7 minutes.

#### CIs in sentences

1. A study was conducted on Minecraft players between the ages of 12 and 18 to ask whether they thought the game would be appropriate to use in school as an educational tool.  The makers of the game did a SRS of 52 registered players.  40 of the 52 said yes, while the rest said no.
• a. Are the sampling conditions met so that we can actually perform a meaningful confidence interval?
• b. What is the population that we're studying?
• c. What is the variable that we're studying?
• d. Enter the data into StatKey and perform the appropriate type of confidence interval at 99% confidence using at least 1000 bootstrap samples.  What is your interval?
• e. Explain all that you just found in a sentence.
• f. Rewrite your interval in +/- form.
• g. Explain your findings in a sentence using the +/- form of the interval.
• h. If you performed 100 samples and created a 99% confidence interval for each, then there will always be one that does not contain the true proportion.  True or false, and why?
2. Marketplace wanted to measure the proportion of customers that bought chips from a new display they put up in the store.  To do so, they randomly selected time intervals over a week-long period where somebody watched the display and counted the number of customers that walked past and the number who grabbed something from the display.  When the data was compiled, 158 of 783 customers grabbed something.
• a. Are the sampling conditions met so that we can actually perform a meaningful confidence interval?
• b. What is the population that we're studying?
• c. What is the variable that we're studying?
• d. Enter the data into StatKey and perform the appropriate type of confidence interval at 90% confidence using at least 1000 bootstrap samples.  What is your interval?
• e. Explain all that you just found in a sentence.
• f. Rewrite your interval in +/- form.
• g. Explain your findings in a sentence using the +/- form of the interval.

3. A study was presented on city soccer players between the ages of 14 and 18 to ask whether they would rather play soccer in the spring rather than the fall. The Recreation Dept did a SRS of 54 players. 31 of the 54 players said no, while the others said yes.

• a. Are the sampling conditions met so that we can actually perform a meaningful confidence interval?
• b. What is the population that we're studying?
• c. What is the variable that we're studying?
• d. Enter the data into StatKey and perform the appropriate type of confidence interval at 99% confidence using at least 1000 bootstrap samples.  What is your interval?
• e. Explain all that you just found in a sentence.
• f. Rewrite your interval in +/- form.
• g. Explain your findings in a sentence using the +/- form of the interval.

4. Subway wanted to measure the proportion of customers that bought cookies. To do so, they randomly selected time intervals over a week-long period to track purchases. 103 of the 533 customers bought cookies.

• a. Are the sampling conditions met so that we can actually perform a meaningful confidence interval?
• b. What is the population that we're studying?
• c. What is the variable that we're studying?
• d. Enter the data into StatKey and perform the appropriate type of confidence interval at 90% confidence using at least 1000 bootstrap samples.  What is your interval?
• e. Rewrite your interval in +/- form.
• f. Explain your findings in a sentence using the +/- form of the interval.

Free Response Prep
Explain both intuitively and mathematically why confidence intervals get narrower as the size of your sample increases.

See the video below.  Consider how much you trust your gut with the opinion of one other person vs. many others.  Consider the patterns that appear in StatKey as you adjust sample size.

Assume you cannot increase the sample size.  Describe the trade-off of confidence level and interval width.  How confident is "confident enough" for scientific publications?  For yourself?

What happens to the interval width when you increase the confidence level on StatKey?  Why?  What confidence level do publications use?  Is that good enough for you?

Why is it important to report the margin of error with your statistics?

Consider the estimates "46%" and "46% +/- 7%".  Why is the second one more useful than the first?

Practice solutions
1. Minecraft for education
• a. Yes -- SRS
• b. Minecraft players 12-18
• c. If the player thinks the game would be appropriate in school as an educational tool.  This is a 2-option categorical variable, so we can summarize with a proportion.
• d. 0.596 to 0.904 (59.6% to 90.4%) -- if you're getting something different, don't forget to convert to a 99% confidence interval.
• e. I'm 99% confident that the proportion of Minecraft players ages 12-18 that think the game would be appropriate in school as an educational tool is between 59.6% and 90.4%.
• f. 0.700 ± 0.154 or  70.0% ± 15.4% (work below)
• Middle: (.596 + .904)/2 = 0.700
• Margin of error: (.904 - .596 )/2 = 0.154
• g. I'm 99% confident that the proportion of Minecraft players ages 12-18 that think the game would be appropriate in school as an educational tool is 70.0% ± 15.4%.
• h. False -- we expect 1, but there could easily be 0, or 2, or some other number.  The problem with probability is that it only tells us what to expect -- there are no guarantees.
2. Marketplace chips
• a. Yes -- randomly selected time slots
• b. Marketplace customers that walk by the chips
• c. If the customer takes a bag of chips or not.
• d. 0.179 to 0.226 (17.9% to 22.6%)
• e. I'm 90% confident that the proportion of Marketplace customers who walk by the chips and actually take the chips is between 17.9% and 22.6%.
• f. 0.202 ± 0.024 (20.2% ± 2.4%)
• Middle: (0.179 +0.226)/2 = 0.202
• Margin of error: (0.226 - 0.179)/2 = 0.024
• g. I'm 90% confident that the proportion of Marketplace customers who walk by the chips that actually take the chips is 20.2% ± 2.4%.
3. Soccer Players

a. Yes, SRS

b. Soccer players 14-18 in the city

c. If the player likes the spring or fall better. This is a 2-option categorical variable, so we can summarize with a proportion.

d. 0.596 to 0.904

e.Im 99% confident that the proportion on soccer players ages 14-18 would like to play in the fall is between 59.6% and 90.4%

f. 70%+- 15.4%

g. I'm 90% confident that the proportion of 14-18 year old soccer players in the city who prefer the spring over the fall is 70%+- 15.4%.

a. Yes -- randomly selected time slots

b. Subway customers that walk by the cookies

c. If the customer buys the cookie or not.

d. 0.179 to 0.226

e. 0.202 +- 0.024

f. I'm 90% confident that the proportion of Subway customers who walk by the chips that actually take the cookies is 20.2% +- 2.4%.

For more material from the New Zealand lady (Dr. Nic), check out this page.

Vocabulary
sampling distribution- The set of possible results from taking many random samples of the same size.  This is usually graphed.  Since it would take an enormous amount of time to create a real sampling distribution with real data, in practical use, you would either approximate it with a normal curve and some calculations or simulate one by bootstrapping.

bootstrapping- A method of re-sampling from your sample data as if it was the entire population.  It is used to generate sampling distributions in order to calculate a confidence interval or perform a hypothesis test.

margin of error- The distance from the best estimate (middle of the CI) to each end of the confidence interval.  It is how much error you could reasonably expect at that level of confidence.

Notes