ONE-sided Significance tests

In this paper (Rubin, 2022), I make two related points: (1) researchers should halve two-sided p values if they use them to make directional claims, and (2) researchers should not halve their alpha level if they're using two one-sided tests to test two directional null hypotheses.

(1) Researchers should halve two-sided p values when making directional claims

Researchers sometimes conduct two-sided significance tests and then use the resulting two-sided p values to make directional claims. I argue that this approach is inappropriate because two-sided p values refer to non-directional hypotheses, rather than directional hypotheses.

So, for example, if you conduct a two-sided t test and obtain a significant two-sided p value, then your significant result refers to a non-directional null hypothesis (e.g., "men have the same self-esteem as women”), and you should make a corresponding non-directional claim (e.g., "men and women have significantly different self-esteem"). If you wish to make a directional claim (e.g., "men have significantly higher self-esteem than women"), then you should halve your two-sided p value to obtain a one-side p value.

This first point is important because, if you use a two-sided p value to make a decision about a directional null hypothesis, then (a) your evidence will be weaker than it should be (i.e., your p value will be too large), and (b) your Type II error rate will be higher than necessary. For the same view, please see Georgi Georgiev’s onesided.org website here.

Sometimes, two-sided tests are called "two-tailed" tests! (I'll get my coat!) Source: https://www.britannica.com/animal/ring-tailed-lemur

(2) Researchers should not halve their alpha level when using two one-sided tests

I also argue that, if you use two one-sided tests to test two directional null hypotheses, then it's not necessary to adjust your alpha level to compensate for multiple testing, because your decision about rejecting each directional hypothesis is based on a single test result, rather than multiple test results.

For example, imagine that you use a one-sided test to test the directional null hypothesis that “men have the same or lower self-esteem than women.” In this case, there's no need to lower your alpha level (e.g., from .050 to .025), because your Type I error rate only refers to a single test of a single null hypothesis; it doesn't refer to either (a) the other directional null hypothesis (i.e., “men have the same or higher self-esteem than women”) or (b) the non-directional null hypothesis (i.e., “men have the same self-esteem as women).” Consequently, no alpha adjustment is required. For similar views, please see Georgi Georgiev's piece here and my paper on multiple testing here.

Further Information

Article

Rubin, M. (2022). That’s not a two-sided test! It’s two one-sided tests! Significance, 19(2), 50-53. Publisher’s version Self-archived version

Corrections and Letter

A helpful reader pointed out that the original version of my article (Rubin, 2020) included some misleading points. A very helpful editor and editorial board allowed me to make the necessary corrections and clarifications in a substantial rewrite of my original article. The above reference links to the revised version (Rubin, 2022). You can see the reader’s concerns and my response here:

Hong, W. (2022). Two-sided tests. [Letter.] Significance, 19(2), 47. https://doi.org/10.1111/1740-9713.01620

Rubin, M. (2022). Two-sided tests: The author replies. [Letter.] Significance, 19(2), 47. https://doi.org/10.1111/1740-9713.01620

You can also see a summary of the changes that were made to the revised article in the Publisher's Note here.