Inconsistent Multiple Testing Corrections
During multiple testing, researchers often adjust their alpha level to control the familywise error rate for a statistical inference about a joint union alternative hypothesis (e.g., “H1,1 or H1,2”). However, in some cases, they do not make this inference. Instead, they make separate inferences about each of the individual hypotheses that comprise the joint hypothesis (e.g., H1,1 and H1,2). For example, a researcher might use a Bonferroni correction to adjust their alpha level from the conventional level of 0.050 to 0.025 when testing H1,1 and H1,2, find a significant result for H1,1 (p < 0.025) and not for H1,2 (p > .0.025), and so claim support for H1,1 and not for H1,2. However, these separate individual inferences do not require an alpha adjustment. Only a statistical inference about the union alternative hypothesis “H1,1 or H1,2” requires an alpha adjustment because it is based on “at least one” significant result among the two tests, and so it refers to the familywise error rate. Hence, an inconsistent correction occurs when a researcher corrects their alpha level during multiple testing but does not make an inference about a union alternative hypothesis. In this new article, I discuss this inconsistent correction problem.
To be clear, I am not opposed to an alpha adjustment for multiple testing under the appropriate circumstances. Hence, my article is not an “anti-adjustment article” (Frane, 2019, p. 3). It is a pro-consistency article! My key point is that researchers should be logically consistent in their use of multiple testing corrections. If researchers use multiple testing corrections, then they should make corresponding statistical inferences about family-based joint hypotheses. They should not correct their alpha level and then only proceed to make statistical inferences about individual hypotheses because such inferences do not require an alpha adjustment (Armstrong, 2014, p. 505; Cook & Farewell, 1996, pp. 96–97; Fisher, 1971, p. 206; García-Pérez, 2023, p. 15; Greenland, 2021, p. 5; Hewes, 2003, p. 450; Hurlbert & Lombardi, 2012, p. 30; Matsunaga, 2007, p. 255; Molloy et al., 2022, p. 2; Parker & Weir, 2020, p. 564; Parker & Weir, 2022, p. 2; Rothman, 1990, p. 45; Rubin, 2017, pp. 271–272; Rubin, 2020a, p. 380; Rubin, 2021a, 2021b, pp. 10978-10983; Rubin, 2024; Savitz & Olshan, 1995, p. 906; Senn, 2007, pp. 150-151; Sinclair et al., 2013, p. 19; Tukey, 1953, p. 82; Turkheimer et al., 2004, p. 727; Veazie, 2006, p. 809; Wilson, 1962, p. 299; for the relevant quotes and links to these articles, please see Appendix B here).
Multiple testing increases the probability that at least one of your significant results is a false positive, but it doesn’t increase the probability that each one of your significant results is a false positive, and so if you make an inference about a joint null hypothesis that can be rejected following at least one significant result, then an alpha adjustment is necessary, and if you don’t, it isn’t!
It’s not necessary to increase protection against Type I errors during single tests of multiple individual hypotheses. Source: https://www.shutterstock.com/image-photo/child-rings-swimming-parent-overprotection-concept-1446404345
Based on a review by García-Pérez (2023), I argue that inconsistent corrections are likely to be very common. I also point out that inconsistent corrections lead to a loss of statistical power. If a researcher adjusts their alpha level below its nominal level to account for multiple testing but only makes statistical inferences about individual hypotheses and not about a joint hypothesis, then they will have lowered the power of their individual tests for no good reason. Consequently, their Type I error rate will be unnecessarily low, and their Type II error rate will be unnecessarily high (see also García-Pérez, 2023, p. 11). I illustrate these issues using three recent psychology studies.
I conclude that inconsistent corrections represent a symptom of statisticism - an overgeneralization of abstract statistical principles at the expense of context-specific nuance and caveats. In response, I argue that we should adopt an inference-based perspective that advocates an alpha adjustment in the case of inferences about intersection null hypotheses but not in the case of inferences about individual null hypotheses.
Further Information
The Article
Rubin, M. (2024). Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses. Methods in Psychology, 10, Article 100140. https://doi.org/10.1016/j.metip.2024.100140
Related Work
García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology, 8, Article 100120. https://doi.org/10.1016/j.metip.2023.100120
Rubin, M. (2024). Type I error rates are not usually inflated. MetaArXiv. https://doi.org/10.31222/osf.io/3kv2b
Rubin, M. (2021). There’s no need to lower the significance threshold when conducting single tests of multiple individual hypotheses. Academia Letters, Article 610. https://doi.org/10.20935/AL610
Rubin, M. (2021). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 199, 10969–11000. https://doi.org/10.1007/s11229-021-03276-4