4. Hypothesis Testing

Learning objectives (and summaries)

Setup a hypothesis test of a population parameter, run the test using simulation, make a logical conclusion, and be aware of potential errors.
    • Understand the purpose of a hypothesis test
      • To test a claim about the population parameter using sample statistics
    • Properly write null and alternative hypotheses.
      • The null hypothesis sets the parameter equal to a value.  For example, µ = 14.
      • The alternate hypothesis sets the direction of expected change: left, right, or two-sided (not equal).  For example, µ > 14.
    • Use random simulation to estimate a p-value for means
      • Use the StatKey simulator with quantitative sample data to create a sampling distribution around the null hypothesis.  The p-value is the proportion of values as low/high/extreme as the sample mean.
    Videos for everyone


     [Copy example data to StatKey from here to practice along with the video.]

    Vocabulary
        null hypothesis (H0): the claim the researcher is challenging, you assume it to be true (but hope that it is wrong)
        alternative hypothesis (HA): the answer that must be true if the researcher's data is too unlikely under the null hypothesis (you hope to end up with the alternative)
        alpha: the cut-off for a p-value to decide whether or not to reject, typically 0.05 (5%)
        p-value: the probability of obtaining the statistic (mean or proportion) you did, or something [higher/lower/more extreme], assuming that the null hypothesis is true

        Practice problems
            For each of the scenarios in 3-6 below, answer all of the following questions:
            • a) What is the population under study?  What are is the default assumption / claim about this population?
            • b) What is the researcher's thought about the population and the default assumption / claim?
            • c) If you were to go out and challenge the claim, what question would you ask the individuals / what would you observe in each individual?
            • d) What type of data (quantitative or categorical) will this question produce?  How is it summarized (mean or proportion)?
            • e) What is the null hypothesis / H0?  Use symbols and subscripts that describe the variable (such as μat bats per game or plike pizza).
            • f) What is the alternative hypothesis / HA?  Again, use symbols and subscripts.
            • g) What is a reasonable cut-off p-value to reject the null (what is α)?
            • h) What test (in StatKey) will you perform?  Left, right, or two-tailed test?
            • i) What assumptions do you need to make before performing the test?  Check them.
            • j) Find the p-value.
            • k) Decide whether or not to reject the null hypothesis.
            • l) Does this make your data statistically significant?
            • m) State your decision in context in a short sentence.
            1) Imagine that the posted number for the average number of hits per team per game in state high school baseball is 6.5 hits/game.  You want to show that your team is truly above average, so you take a random sample of the number of hits your team had in 18 games.  Here is the data: 5, 6, 8, 7, 11, 13, 0, 7, 8, 5, 7, 10, 8, 10, 9, 7, 2, 9.  [Copy to StatKey from here.]

            2) A research team believes they invented a new additive to coffee that will increase reaction time of ping pong players.  A typical (non-expert) player has an average reaction time of 0.25 seconds after the opponent hits the ball.  To test your additive, you took a stratified random sample of both genders and all adult age groups.  The resulting times from your tests were: 0.22, 0.31, 0.30, 0.25, 0.23, 0.28, 0.21, 0.17, 0.21, 0.25, 0.22, 0.18, 0.23, 0.24, 0.23, 0.22.  [Copy to StatKey from here.]

            Practice solutions
            1. High school baseball
              a) Your baseball team is under study.  The default assumption, based on the state average, is that the team gets an average of 6.5 hits per game.
              b) You (the researcher) thinks that the team gets more hits than the state average.
              c) For each game (the individuals in this study are the games), I would lookup "how many hits did this team get"?
              d) Quantitative, summarized by a mean
              e) μhits per game = 6.5
              f) μhits per game > 6.5
              g) 0.05 (the default unless there is reason for something else)
              h) Test for Single Mean (right tailed test)
              i) You need to check that the data was gathered randomly, which the problem says it is
              .
              j) p-value = 0.12 (note this is APPROXIMATE, use 5000+ resamples in StatKey)
              k) Fail to reject.
              l) Test was not statistically significant (because you failed to reject)
              m) Though our sample average is above 6.5, there is NOT enough evidence beyond random chance that your baseball team is better than the state average [The color coding is here to help you connect what you are stating back to your original populationassumption / null, and goal / alternative.]

            2. Reaction times
              a) Adult ping pong players.  The default assumption is that reaction times will not change with the new additive, so the reaction time will stay at 0.25 seconds
              b) The researchers believes their additive will help, so they think that the reaction time will go down
              c) I would have people drink the coffee with additive, then measure each person's reaction time to a ping pong return.
              d) Quantitative, summarized as a mean
              e) μreaction time = 0.25
              f) μreaction time < 0.25
              g) 0.05 (the default unless there is reason for something else)
              h) Test for Single Mean (left tailed test)
              i) You need to check that the data was gathered randomly, and the problem said you did a stratified random sample.  I may be a bit suspicious as to geographically where they are sampling people, but we'll let that slide today.
              j) p-value = 0.035 (note this is APPROXIMATE, use 5000+ resamples in StatKey)
              k) Reject
              l) Test was statistically significant (because you rejected)
              m) Your evidence suggests that your additive reduces the average reaction time of an adult ping pong player below the typical time of 0.25 seconds.  [The color coding is here to help you connect what you are stating back to your original population, assumption / null, and goal / alternative.]

            Practice quiz (we will go over answers in class as a group)
                  1. You go out golfing with some friends. Ian says that he averages 71 strokes per 18-hole round. You think he has more strokes per round (that he is a worse golfer). What are the null and alternative hypotheses?

                  2. Whenever you perform a hypothesis test, what assumption do you need to check first? Why does this matter?

                  3. A basketball coach believes the other team's coach is publishing player heights that are taller than the players really are in the handout at games. He is a clever chap and takes his tallest player and puts inch-marks along the side of his face, then asks this player to stand next to each of the opposing players before the game so he can measure them. The average height in the brochure was 6'2" (74") tall. An SRS of players found the following values: 68", 77", 71", 73", 75", 76", 71", 69", 65", 72", 74".  [Copy to StatKey from here.]  Write the hypotheses statements (with symbols and subscripts), choose alpha, find the p-value, decide to reject / find statistical significance, and mention what type of error could have been made.

                  Notes