Multiple comparisons
These videos need to be updated, but the material is important, so these old versions are here in the meantime.
The total length of the videos in this section is approximately 45 minutes.
You can also view all the videos in this section at the YouTube playlist linked here.
Motivation: Compounding error
![](https://www.google.com/images/icons/product/drive-32.png)
Question 1: Suppose that you run 20 hypothesis tests, for 20 unrelated data sets, each with a p-value cutoff of 0.05. If all of the null hypotheses are actually true, what's your best guess at the number of null hypotheses you'll reject?
Show answer
20 * .05 = 1
Publication bias
![](https://www.google.com/images/icons/product/drive-32.png)
Question 2: Which of the following could lead to multiple comparison problems? Check all that apply.
You have multiple research questions about the same data set
You run multiple tests on the same variables until you find a significant result
You run a regression of an outcome on lots of predictors and select the significant predictors
You collect data in cohorts and refine your research question as you study a question, and you submit your results for publication when you find something significant
Many papers are submitted to a journal, and the most interesting discoveries are published
Show answer
All of the above. The multiple comparison procedures introduced in this section won't make all of these problems disappear. It's important to be aware of the compounding of error rates any time you look at study results.
Example
![](https://www.google.com/images/icons/product/drive-32.png)
Question 3: How many hypothesis tests are we running for this particular analysis of this example data?
Show answer
6
Family-wise error and procedures
The first few seconds of this video are audio only.
![](https://www.google.com/images/icons/product/drive-32.png)
Question 4: You run 1000 hypothesis tests while searching for patterns in a genetics data set. Which goal seems more ambitious?
The probability that there is even one of the 1000 null hypotheses is falsely rejected is 0.05
Of the null hypotheses that you do reject, the proportion that should not have been rejected is 0.05
Show answer
The first option is harder to achieve. The first optoin is family-wise error. The second is false discovery rate. You might disagree about what's more ambitious - the point is that the family-wise error is about the chances of avoiding an errors at all, while the false discovery rate acknowledges that there will be errors and attempt to minimize them.
More family-wise error
![](https://www.google.com/images/icons/product/drive-32.png)
Question 5: Which goal was proposed more recently?
Family-wise error
False discovery rate (video coming up)
Show answer
False discovery rate
Bonferroni 1
This video went a little bit viral (though, that fascinating optional discussion of covariance and the intro to logs have more views). We are famous!
![](https://www.google.com/images/icons/product/drive-32.png)
Bonferroni 2
![](https://www.google.com/images/icons/product/drive-32.png)
Question 6: You have 10 tests. Using Bonferroni, what p-value cutoff should you use for each of them if you want the probability of at least one falsely rejected null to be no bigger than 0.05?
Show answer
 0.005
Family-wise methods
![](https://www.google.com/images/icons/product/drive-32.png)
Question 7: In general, what happens to each confidence interval when you apply a family-wise multiple comparison procedure?
Interval gets wider
Interval stays same width
Interval gets narrower
Show answer
Interval gets wider. By widening the interval, the procedures make it more likely that each interval will cover its true value. This is equivalent to reducing the p-value cutoff for a hypothesis test so that we are less likely to reject the null.
Comparison
![](https://www.google.com/images/icons/product/drive-32.png)
Question 8: You do a survey of Wellesley students from 5 majors. You would like to compare students' answers for each pair of majors. Which method is more appropriate?
Bonferroni
Fisher's HSD
Tukey
Scheffe
Show answer
Tukey. Tukey is designed specifically for looking at all possible pairs of several groups. Bonferroni is more general and will probably make the intervals wider than necessary. Scheffe is for looking at a very large set of questions and will make the intervals enormous. Fisher's HSD is equivalent to doing nothing, which isn't a bad option if your audience generally understands that multiple comparison problems exist.
Planned v. Unplanned
![](https://www.google.com/images/icons/product/drive-32.png)
Question 9: What's the simplest way to avoid multiple comparison problems in your research?
Show answer
Before looking at the data, specify a small number of tests that you plan to run, and stick to the plan. You'll still be running multiple tests, but no one can accuse of you of looking around until you found something significant.
Benjamini-Hochberg
![](https://www.google.com/images/icons/product/drive-32.png)
Question 10: Suppose you conduct 5 tests and obtain p-values of .015,.018, .035, .041, and .055. According to Benjamini-Hochberg with a false discovery rate of .05, how many significant results do you have?
Show answer
The answer is 2. We compare the p-values to .01, .02, .03, .04, and .05, respectively. The algorithm says that we should choose the largest q such that the qth p-value is less than its cutoff, and reject the nulls for the hypothesis tests from 1 to q. So, q=2, because .018 < .02. We reject the first two null hypotheses.
You are done!