Multiple comparisons

These videos need to be updated, but the material is important, so these old versions are here in the meantime.

The total length of the videos in this section is approximately 45 minutes.

You can also view all the videos in this section at the YouTube playlist linked here.

Motivation: Compounding error

MC.1.CompoundingError_0B4boNbZVjfEDTlduYWtEN0tXUjA.mov

Question 1: Suppose that you run 20 hypothesis tests, for 20 unrelated data sets, each with a p-value cutoff of 0.05. If all of the null hypotheses are actually true, what's your best guess at the number of null hypotheses you'll reject?

Show answer

20 * .05 = 1

Publication bias

MC.2.PublicationBias.mp4

Question 2: Which of the following could lead to multiple comparison problems? Check all that apply.

You have multiple research questions about the same data set
You run multiple tests on the same variables until you find a significant result
You run a regression of an outcome on lots of predictors and select the significant predictors
You collect data in cohorts and refine your research question as you study a question, and you submit your results for publication when you find something significant
Many papers are submitted to a journal, and the most interesting discoveries are published

Show answer

All of the above. The multiple comparison procedures introduced in this section won't make all of these problems disappear. It's important to be aware of the compounding of error rates any time you look at study results.

Example

MC.3.Example.mp4

Question 3: How many hypothesis tests are we running for this particular analysis of this example data?

Show answer

Family-wise error and procedures

The first few seconds of this video are audio only.

MC.4.FamilyWiseErrors_0B4boNbZVjfEDTVVTSEVFY3otWE0.mov

Question 4: You run 1000 hypothesis tests while searching for patterns in a genetics data set. Which goal seems more ambitious?

The probability that there is even one of the 1000 null hypotheses is falsely rejected is 0.05
Of the null hypotheses that you do reject, the proportion that should not have been rejected is 0.05

Show answer

The first option is harder to achieve. The first optoin is family-wise error. The second is false discovery rate. You might disagree about what's more ambitious - the point is that the family-wise error is about the chances of avoiding an errors at all, while the false discovery rate acknowledges that there will be errors and attempt to minimize them.

More family-wise error

MC.5.FamilyWiseErrors2_0B4boNbZVjfEDWmt0bE9NcWp1dXc.mov

Question 5: Which goal was proposed more recently?

Family-wise error
False discovery rate (video coming up)

Show answer

False discovery rate

Bonferroni 1

This video went a little bit viral (though, that fascinating optional discussion of covariance and the intro to logs have more views). We are famous!

MC.6.Bonferroni 1.mp4

Bonferroni 2

MC.7.Bonferroni2.mp4

Question 6: You have 10 tests. Using Bonferroni, what p-value cutoff should you use for each of them if you want the probability of at least one falsely rejected null to be no bigger than 0.05?

Show answer

0.005

Family-wise methods

MC.8.FWMethods_0B4boNbZVjfEDUGVsb0xHTWphNkE.mov

Question 7: In general, what happens to each confidence interval when you apply a family-wise multiple comparison procedure?

Interval gets wider
Interval stays same width
Interval gets narrower

Show answer

Interval gets wider. By widening the interval, the procedures make it more likely that each interval will cover its true value. This is equivalent to reducing the p-value cutoff for a hypothesis test so that we are less likely to reject the null.

Comparison

MC.9. Comparison.mp4

Question 8: You do a survey of Wellesley students from 5 majors. You would like to compare students' answers for each pair of majors. Which method is more appropriate?

Bonferroni
Fisher's HSD
Tukey
Scheffe

Show answer

Tukey. Tukey is designed specifically for looking at all possible pairs of several groups. Bonferroni is more general and will probably make the intervals wider than necessary. Scheffe is for looking at a very large set of questions and will make the intervals enormous. Fisher's HSD is equivalent to doing nothing, which isn't a bad option if your audience generally understands that multiple comparison problems exist.

Planned v. Unplanned

QAICourse4.15Planned.mov

Question 9: What's the simplest way to avoid multiple comparison problems in your research?

Show answer

Before looking at the data, specify a small number of tests that you plan to run, and stick to the plan. You'll still be running multiple tests, but no one can accuse of you of looking around until you found something significant.

Benjamini-Hochberg

MC.11.BenjanminiHochberg_0B4boNbZVjfEDU2puSGlhZFlaRzQ.mov

Question 10: Suppose you conduct 5 tests and obtain p-values of .015,.018, .035, .041, and .055. According to Benjamini-Hochberg with a false discovery rate of .05, how many significant results do you have?

Show answer

The answer is 2. We compare the p-values to .01, .02, .03, .04, and .05, respectively. The algorithm says that we should choose the largest q such that the qth p-value is less than its cutoff, and reject the nulls for the hypothesis tests from 1 to q. So, q=2, because .018 < .02. We reject the first two null hypotheses.

You are done!