Review

Learning objectives (and summaries)

Integrate all the core concepts we learned about one-variable statistics -- collecting samples, summarizing data, and inferring population parameters from the statistics.

Assessment
    • The quarter final test will be the assessment for the review.
    Instruction
        Review videos from the previous core modules.  See the questions below for a summary of the essential elements and how they tie together:

        What you are studying [modules 1 & 3]:

        1. What is the population of interest in this study?

        2. What is the variable being studied?  Is it quantitative or categorical?

        3. Name and describe techniques that could produce a random sample.

        4. Name and describe the ways to introduce bias into the sample data.

        5. Which symbol will represent the sample average/proportion that results from the study?

        6. Which symbol will represent the population average/proportion that is being estimated in the study?

        7. What types of graphs can be used to visualize the sample data?

        If you have quantitative data [module 2]:

        1. Create a box plot to visualize the sample data.  What fraction of the data is above the third quartile of the box plot. Each of the quartiles?  The median?  What fraction is between the quartiles?

        2. Create a stem plot to visualize the sample data.  List the primary reasons why you would want to create a stem plot.

        3. When is it appropriate to use a histogram instead of a stem plot?

        4. What is the mean and standard deviation?

        5. Are the mean and median the same?  If not, why is one larger than the other?

        6. The standard deviation is affected by every number in a distribution.  What values could you remove from this distribution to increase the standard deviation?  Decrease it?


        Confidence intervals around an estimate [module 4]:

        1. StatKey takes in your raw data and spits out a distribution (that often looks uni-modal and symmetrical).  What is going on behind the scenes with this set of data?

        2. Find a 90%, 95%, or 99% confidence interval of the data.

        3. Convert these intervals to +/- form.

        4. How often is a 95% confidence interval correct?

        5. Describe your confidence interval in a sentence.

        6. Explain the trade-off of losing confidence to get more precision (smaller width) in a confidence interval.

        7. Explain what effect a large sample has on the precision (width) of a confidence interval?


        Practice

            Question A:

            What fraction of Byron Students own cars?

            1. What is the population of interest in this study?

            2. What is the variable being studied?  Is it quantitative or categorical?

            3. Name and describe techniques that could produce a random sample.

            4. Name and describe the ways to introduce bias into the sample data.

            5. Which symbol will represent the sample average/proportion that results from the study?

            6. Which symbol will represent the population average/proportion that is being estimated in the study?

            7. What types of graphs can be used to visualize the sample data?

            You take a SRS. Your data found that 22/58 students own cars.

            1. StatKey takes in your raw data and spits out a distribution (that often looks uni-modal and symmetrical).  What is going on behind the scenes with this set of data?

            2. Find a 90%, 95%, or 99% confidence interval of the data.

            3. Convert these intervals to +/- form.

            4. How often is a 95% confidence interval correct?

            5. Describe your confidence interval in a sentence.

            6. Explain the trade-off of losing confidence to get more precision (smaller width) in a confidence interval.

            7. Explain what effect a large sample has on the precision (width) of a confidence interval?


            Question B:

            What is the average number of children in an Olmsted County home?

            1. What is the population of interest in this study?

            2. What is the variable being studied?  Is it quantitative or categorical?

            3. -

            4. Name and describe the ways to introduce bias into the sample data.

            5. Which symbol will represent the sample average/proportion that results from the study?

            6. Which symbol will represent the population average/proportion that is being estimated in the study?

            7. What types of graphs can be used to visualize the sample data?

            You stratify by city and select a random sampling of 24 homes. Here are the results: 0 3 2 0 1 1 0 2 3 2 2 0 3 3 3 3 0 6 0 2 2 2 3 2

            1. Create a box plot to visualize the sample data.  What fraction of the data is above the third quartile of the box plot. Each of the quartiles?  The median?  What fraction is between the quartiles?

            2. Create a stem plot to visualize the sample data.  List the primary reasons why you would want to create a stem plot.

            3. When is it appropriate to use a histogram instead of a stem plot?

            4. What is the mean and standard deviation?

            5. Are the mean and median the same?  If not, why is one larger than the other?

            6. The standard deviation is affected by every number in a distribution.  What values could you remove from this distribution to increase the standard deviation?  Decrease it?

            1. -

            2. Find a 90%, 95%, or 99% confidence interval of the data.

            3. Convert these intervals to +/- form.

            4. -

            5. Describe your confidence interval in a sentence.


            Practice solutions
                Question A:
                1. Byron Students
                2. The question you ask each person: “Do you own a car or not?”, thus it is categorical (yes or no)
                3. Could use a SRS (all have equal chance), stratified (a few from each grade, for example), or systematic (every 12th person in the door, for example)
                4. Undercoverage, bad questions, convenience sampling, voluntary sample, etc.
                5. p-hat (because it is a sample proportion)
                6. p (because it is a population proportion)
                7. Pie graph or a bar graph (because it is categorical data)
                8. -
                9. -
                10. -
                11. -
                12. -
                13. -
                14. StatKey is taking in your sample information and using it to create a mock population. From here, it takes many SRSs of this data to see what kind of results seem reasonable / possible.  The resulting curve shows all of the random possibilities, and the confidence interval chops off the most extreme parts of the curve to generate a reasonable range that should capture the true proportion most of the time.
                15. Do on StatKey, 5000+ resamples: 
                  1. 90%: 27.6% to 48.3%
                  2. 95%: 25.9% to 50.0%
                  3. 99%: 22.4% to 55.2%
                16. Calculate average and gap from middle to end:
                  1. 90%: 38.0% ± 10.4%
                  2. 95%: 38.0% ± 12.1%
                  3. 99%: 38.8% ± 16.4%
                17. About 95% of the time IF you have good, unbiased data.  Note that sample size does NOT matter for this, just your data collection technique.
                18. I am 95% confident that the proportion of Byron students who own a car is between 26%-50%
                19. The lower the confidence, the more precision (narrow interval). The higher the confidence, the lower the precision (wider interval).
                20. The larger the sample, the more precise (narrower) the confidence interval will be without sacrificing confidence.

                Question B:

                1. Olmsted County homes
                2. Number of children -- this is a quantitative variable
                3. -
                4. Ask questions differently to different people, sample in a bad way (like convenience or voluntary), allow for high nonresponse by not following up with people that don't answer the first time.
                5. x-bar
                6. μ (mu)
                7. box plot, stem plot, histogram
                8. Every quartile has 25% of the data -- that is why it is called a QUARTile. If you use StatKey to make the graph, you have:
                  1. min=0
                  2. Q1=0.5
                  3. med=2
                  4. Q3=3
                  5. max=6
                9. This data is awful for a stem plot. Stem plots are much better for multi-digit data.
                10. When you have lots of data (like 30+ numbers) or data that doesn't cleanly make a stem plot, such as this problem, a histogram may be a good graph.
                11. Mean =1.875, standard deviation=1.454 [from StatKey]
                12. No they are not. This distribution behaves strangely in that the mean is pulled down below the median, yet the tail of the curve (the skew) is slightly to the right. The mean is smaller because there are lots of zeros.
                13. To make the standard deviation (average distance from the mean) go up, you would get rid of numbers close to the mean, such as the 2's. To decrease the standard deviation, you would get rid of numbers far from the mean, such as the 6.
                14. -
                15. Intervals from StatKey (Interval of a mean):
                  1. 90%: 1.417 to 2.375
                  2. 95%: 1.292 to 2.458
                  3. 99%: 1.125 to 2.667
                16. Plus/minus:
                  1. 90%: 1.896 ± 0.479
                  2. 95%: 1.875 ± 0.583
                  3. 99%: 1.896 ± 0.771
                17. -
                18. I am 95% that the average number of children in an Olmsted County home is between 1.292 and 2.458 kids.

                Notes