B. Prove It‎ > ‎

1. Hypothesis Testing

Task: Teacher claim video
      Each team will compile a video of the process of attempting to prove a teacher wrong.  The video will be completed at the end of the unit, but you will start now with the following tasks:
      • As a group, create a categorical question about Byron HS students.  For example, "What proportion of BHS students drive to school?"
      • Ask a teacher to give their answer....this will be the Null Hypothesis.  For example, Mr. Pethan says 23% of students drive to school, so then H0: p = .23.  Record the teacher, with clear audio, stating their claim.
      • As a group, decide if you think the true proportion is less then, greater than, or just different than the teacher's statement  For example, your group thinks it is higher, so then HA: p > .23.
      • Create a random sample of Byron HS students and sample at least 30.  Ask them the same question you asked the teacher.  Record a video clip of a handful (not all) of the students as they answer.
      • Calculate the p-value for your test using StatKey.  Get your work checked by a teacher or peer from another group.
      Mastery Quiz Prep

          Watch if you missed class

          Printable guided notes: Guided notes

          For each of the following situations (1-2):
          • a) What is the null hypothesis in this situation?
          • b) What is the alternative hypothesis?
          • c) What would you use as evidence to try to prove that the null should be rejected?
          • d) Identify the Type I and Type II errors and state the consequences of each type of error.
          • e) Decide which type of error is most harmful/dangerous/bad.
          1) Some states’ motor vehicle registration folks assume that a car is unsafe until they do an inspection and certify that it is safe.

          2) A school instituted the “well behaved student” policy.  Every student is assumed to be well behaved until a written warning proves otherwise. 

          For each of the scenarios in 3-4 below, answer all of the following questions:
          • a) What is the claim?
          • b) What is the researcher's thought about the claim?
          • c) If you were to go out and challenge the claim, what question would you ask the individuals / what would you observe in each individual?
          • d) What type of data (quantitative or categorical) will this question produce?  How is it summarized (mean or proportion)?
          • e) What is the null hypothesis / H0?  Use symbols and subscripts that describe the variable (such as μat bats per game or plike pizza).
          • f) What is the alternative hypothesis / HA?  Again, use symbols and subscripts.
          • g) What is a reasonable cut-off p-value to reject the null (what is α)?
          • h) What test (in StatKey) will you perform?  Left, right, or two-tailed test?
          • i) What assumptions do you need to make before performing the test?  Check them.
          • j) Find the p-value.
          • k) Decide whether or not to reject the null hypothesis.
          • l) Does this make your data statistically significant?
          • m) State your decision in context in a short sentence.
          • n) If you failed to reject, was you p-value strong enough to warrant redoing the study with a larger sample size?
          • o) If you rejected, did the difference you found between the sample mean/proportion and the claimed mean/proportion seem like a meaningful difference?
          • p) Imagine that you found out from a census of all of the data sometime later that you made the wrong decision.  What type of error would you have made (Type I or II)?
          3) You read somewhere that 53% of people like M&M’s.  You want to see if this is true in your school too, so you took an SRS.  Your data found that 34 of 88 people liked the candy.

          4) A regional softball league claimed that half of its pitchers could pitch a ball over 65mph.  A competing league thought this value was too high, so they looked at a simple random sample of 28 pitchers and found only 10 who could throw this fast.

          1. Let's say that you had a p-value that was almost good enough, perhaps 0.06.  Should you "take more samples" or "increase your sample size"?
          2. You are conducting a study of the local environment.  After conducting your research, you found that a local waste plant does not make a huge increase in the amount of carbon monoxide in the air.  However, you still want to publish “statistically significant” results.  Your alpha value is fixed at .05 due to the requirements of the journal you want to submit your paper to.  What will you need to do to get statistically significant results?
          3. List the null and alternative hypotheses in a court of law.  Explain what Type I and Type II errors correspond to.  Why do you think the founding fathers setup the judicial system this way?
          4. In court, when a person is not convicted, they are not declared “innocent” by the jury.  They are just declared “not guilty”.  Explain this subtle difference and how it connects to hypothesis testing.
          5. Medical experiments frequently use hypothesis testing to show that one treatment group did better than another.  How might you setup a test so that the results are most likely to appear significant?  Consider your alpha value, sample size, and direction of the test.
          6. You are developing a new disease test.  In this test, the null hypothesis is that you do not have the disease, with the alternative that you do have the disease.  You need to balance Type I and Type II errors (decreasing the probability of one type of error increases the probability of another type).  In this scenario, which type of error should you increase and which should you minimize?  Why?

          Free Response Prep
              How do you choose a null hypothesis?  Where do they come from?

              Briefly describe 3 ways to increase your chance of rejecting the null hypothesis.

              Compare and contrast a confidence interval and a hypothesis test.

              Practice solutions
                  1. a) H0: the car is unsafe to drive (I assumed unsafe first because the problem said that the state "assumes" it is unsafe until the inspection).
                    b) HA: the car is safe to drive
                    c) Possible evidence could come from inspections that look for all of the common things that make a car unsafe.
                    d) Type I error: incorrectly rejecting -- car passing inspection when it is not actually safe.  This means more unsafe vehicles on the road.
                    Type II error: incorrectly failing to reject -- a safe car not passing inspection.  This means that people with safe cars will be prevented from driving their car.
                    e) Probably Type I since unsafe cars could kill people, but there would be a lot of angry people if the Type II error rate was too high and people couldn't drive their safe cars.
                  2. a) H0: the student is well behaved; HA: the student is not well behaved (I assumed behaved first because the problem said that the school "assumes" all are well behaved until they get write-ups.).
                    b) HA: the student is not well behaved 
                    c) Evidence could be teachers observing bad behavior, actions caught on camera, or reports from other students about bad behavior.
                    d) Type I error: incorrectly rejecting -- well behaved student gets written-up.  This seems unfair to that student.
                        Type II error: incorrectly failing to reject -- a poorly behaved student does not get written up.  This means that some naughty kids didn't get caught.
                    e) Type I is worse -- it is better for the well-behaved student to be fairly rewarded than to make sure to catch every naughty student.
                  3. M&M's:
                    a) 53% of the school likes M&Ms
                    b) You think it might be different at your school
                    c) You would ask people at your school "do you like M&Ms?"
                    d) Categorical, summarized by a proportion
                    e) plikes M&Ms = .53
                    f) plikes M&Ms ≠ .53
                    g) 0.05 (the default unless there is reason for something else)
                    h) Test for Single Proportion (two tailed test)
                    i) You need to check that the data was gathered randomly, and the problem said you used an SRS.
                    j) p-value = .008 (note this is APPROXIMATE, use at least 3000 resamples in StatKey)
                    k) Reject
                    l) Test was statistically significant (because you rejected)
                    m) Your evidence suggests that the proportion of students at your school who like M&Ms is NOT 53%.
                    n) n/a
                    o) The difference between the null hypothesis of 53% and the sample proportion 34/88=39% seems like a very large and meaningful difference.
                    p) Type I error (because you rejected)
                  4. Softball pitchers
                    a) Half of the pitchers in the regional softball league throw over 65mph
                    b) Less than half of the pitchers throw this fast
                    c) Observe each pitcher with a radar gun to decide "do you throw over 65mph or not?"
                    d) Categorical (they do or they don't throw over 65mph), summarized by a proportion
                    e) pthrows over 65mph = 0.5
                    f) pthrows over 65mph < 0.5
                    g) 0.05 (the default unless there is reason for something else)
                    h) Test for Single Proportion (left tailed test)
                    i) You need to check that the data was gathered randomly, and the problem said you used an SRS.
                    j) p-value = 0.09
                    k) Fail to reject
                    l) Not statistically significant
                    m) We did not find enough evidence to prove that less than half of the softball pitchers throw over 65mph.
                    n) Yes -- 0.09 is getting very close, and if you were really out to prove something, a larger sample size might help you lower your p-value under 0.05.
                    o) n/a
                    Type II error (because you failed to reject)
                  5. The phrase "take more samples" makes no sense, but many in class seems to use it anyways.  STOP IT!  If your sample size is too small, say you "need to increase your sample size".
                  6. Since alpha and your effect size are already fixed, you will need a larger sample size.
                  7. Null: the defendant is innocent; Alt: the defendant is guilty.  Type I error is rejecting the null hypothesis by mistake (convicting an innocent person).  Type II error is failing to reject the null hypothesis by mistake (letting a guilty person free).  This was setup this way so that government leaders could not arbitrary imprison people that they did like and force a defendant to provide evidence that they are innocent (among many other related reasons).  Instead, the evidence must be brought forth by those who want to accuse someone else.
                  8. When letting someone free, the jury is saying that they do not have enough evidence to convict someone.  They may not be convinced that the person is innocent, but they are also not convinced enough that the person is guilty.  The same is true in any hypothesis test -- it is not a question of whether or not you *think* the null hypothesis (H0) is false, it is a question of if you have enough evidence to prove the null so unlikely that the alternative must be true.
                  9. A high alpha value allows you find results with a higher p-value “significant”.  However, most publications don’t let you set your alpha value above .05.  A large sample size is a good way to pick-up a difference and be statistically significant, even if the resulting change is not all that important.
                  10. If your test says that a healthy person is sick, that person will have some unnecessary worry and would undergo more testing.  If your test says that a sick person is healthy, that person might die.  Thus, you want to minimize the chance that someone who truly has the disease will be left undiagnosed.  In this case, the null hypothesis is “healthy”, so you are okay rejecting this too often (higher chance of Type I error) so that you can minimize the times that you fail to reject when you should have rejected (low Type II error).
                  Practice quiz (we will go over answers in class as a group)
                        1. A highly biased group wants to debunk the idea that the earth’s CO2 concentration has changed in a statistically significant way over the past 40 years. To do this, they run a series of tests and show results that do not reach statistical significance. How might they design their tests to not reach statistical significance? Why is "failing to reach statistical significance" not a reasonable way to disprove something?

                        2. You conduct a study at the standard significance level. Your p-value is .023. A larger follow-up study showed that the null hypothesis was true. Did you commit an error? If so, what type?

                        3. Whenever you perform a hypothesis test, what is the most important assumption you need to check? Why does this matter?
                        4. As the researcher, do you want the null or alternative hypothesis to be correct? Or does it change in different situations?

                        5. A local politician you don't like said that 60% of city residents support an increase in taxes to improve city parks.  You don't particularly care about this issue, but you want to use it as an opportunity to show that he quotes inaccurate statistics.  You designed a SRS of residents and reached 80 people.  40 supported the raise in taxes for the parks.  Use the standard process we practiced to decide if you have enough evidence to prove the politician wrong.  As a follow-up question, how could you have rephrased the original problem to make an even better case?

                            null hypothesis (H0): the claim the researcher is challenging, you assume it to be true (but hope that it is wrong)
                            alternative hypothesis (HA): the answer that must be true if the researcher's data is too unlikely under the null hypothesis (you hope to end up with the alternative)
                            alpha: the cut-off for a p-value to decide whether or not to reject
                            p-value: the probability of obtaining the statistic (mean or proportion) you did, or something [higher/lower/more extreme], assuming that the null hypothesis is true
                            type I error: when you reject the null hypothesis but shouldn't have (because the null was true)
                            type II error: when you fail to reject the null hypothesis and should have rejected (because the null was in fact false)