Final Prep

Learning objectives (and summaries)

Integrate all the core concepts we learned about one-variable statistics -- collecting samples, summarizing data, and inferring population parameters from the statistics.

Assessment

    • The quarter final test will be the assessment for the review.

Instruction

Review videos from the previous core modules.

Practice

Question A:

What fraction of Byron Students own cars?

What you are studying [modules 1 & 4]:

    1. What is the population of interest in this study?

    2. What is the variable being studied? Is it quantitative or categorical?

    3. Name and describe techniques that could produce a random sample.

    4. Name and describe the ways to introduce bias into the sample data.

    5. Which symbol will represent the sample average/proportion that results from the study?

    6. Which symbol will represent the population average/proportion that is being estimated in the study?

    7. What types of graphs can be used to visualize the sample data?

Your friend says he thinks half of the students at Byron own cars, but you think it is less than half. So you take a SRS. Your data found that 22/58 students own cars.

If you have quantitative data [modules 2 & 3]:

    1. Create a box plot to visualize the sample data. What fraction of the data is above the third quartile of the box plot. Each of the quartiles? The median? What fraction is between the quartiles?

    2. Create a stem plot to visualize the sample data. List the primary reasons why you would want to create a stem plot.

    3. When is it appropriate to use a histogram instead of a stem plot?

    4. What is the mean and standard deviation?

    5. Are the mean and median the same? If not, why is one larger than the other?

    6. The standard deviation is affected by every number in a distribution. What values could you remove from this distribution to increase the standard deviation? Decrease it?

    7. What are two measurements that compare an individual to the rest of a distribution?

Confidence intervals around an estimate [module 5]:

    1. StatKey takes in your raw data and spits out a distribution (that often looks uni-modal and symmetrical). What is going on behind the scenes with this set of data?

    2. Find a 90%, 95%, or 99% confidence interval of the data.

    3. Convert these intervals to +/- form.

    4. How often is a 95% confidence interval correct?

    5. Describe your confidence interval in a sentence.

    6. Explain the trade-off of losing confidence to get more precision (smaller width) in a confidence interval.

    7. Explain what effect a large sample has on the precision (width) of a confidence interval?

Hypothesis testing [module 6]: Your friend says he thinks half of the people own cars, but you think it is less.

    1. What are the appropriate null and alternative hypotheses for this situation?

    2. What is needed for this data to be statistically significant at α = 0.05?

    3. Unlike a confidence interval in StatKey, a hypothesis test centers the data around the null hypothesis and not the sample mean. Why?

    4. What is the p-value for the appropriate test?

    5. What does this p-value actually tell you (in a sentence)?

    6. Do you reject? Is your data statistically significant? If you do not reject, was it close enough to warrant further study?

    7. If your conclusion was incorrect in this problem, what type of error did you make?

Practice solutions

Question A:

  1. Byron Students

  2. The question you ask each person: “Do you own a car or not?”, thus it is categorical (yes or no)

  3. Could use an SRS (all have equal chance), Stratified (a few from each grade, for example), or Systematic (every 12th person in the door, for example)

  4. Undercoverage, bad questions, convenience sampling, voluntary sample, etc.

  5. P-Hat (because it is a sample proportion)

  6. P (because it is a population proportion)

  7. A pie graph or a bar graph (because it is categorical data

  8. Can’t do (not quantitative)

  9. Can’t do

  10. Can’t do

  11. Can’t do

  12. Can’t do

  13. Can’t do

  14. Can’t do

  15. StatKey is taking in your information and taking SRS’s of your data. As you generate random samples, it shows the number of times in those SRS’s that a person either owns a car or doesn't own a car.

  16. Do on StatKey: Example answer - 95% confident it’s between 0.26 and 0.50

  17. Do on StatKey: Example answer - 95% confident it’s 0.38 +/- 0.12

  18. 95% of the time

  19. I am 95% confident that the proportion of Byron students who own a car is between 15 and 29

  20. The lower the confidence, the more precision. The higher the confidence, the lower the precision.

  21. The larger the sample, the better and small the confidence interval will be

  22. Null Hypothesis is P=.5 Alternate Hypothesis is P<.5

  23. The p-value must be less than .05

  24. Because you are trying to disprove the null hypothesis

  25. Do on StatKey: Example answer: .042

  26. This p-value tells me that there is a .042 probability of finding data as low (22/58) or lower than I did, assuming the null hypothesis (P=.5) is true.

  27. Yes I do reject the null and my data is statistically significant.

  28. If I rejected the null but the null was actually true, it would have been a Type 1 error.

Notes