Robustness and Power
The total length of the videos in this section is approximately 38 minutes. You will also spend time answering short questions while completing this section.
You can also view all the videos in this section at the YouTube playlist linked here and here (there is some reordering).
Robustness
![](https://www.google.com/images/icons/product/drive-32.png)
Question 1: If the null hypothesis of a test is true, what's the probability that the p-value will be less than 0.05? What's your best guess?
Show answer
0.05, as explained in videos that follow.
Distribution of p-values if null is true
![](https://www.google.com/images/icons/product/drive-32.png)
Question 2: If the null hypothesis of a test is true, what is the probability of observing a p-value less than 0.15?
Show answer
0.15. For a uniform distribution where the values fall between 0 and 1 - which is the case for the distribution of p-values when the null hypothesis is true - the probability of obtaining a number less than (or equal to) x is equal to x.
p-values have a uniform distribution when the null hypothesis is true
![](https://www.google.com/images/icons/product/drive-32.png)
Question 3: If we simulate data sets where the null hypothesis is true, apply a hypothesis test to each data set, and record the p-values, what would the distribution of p-values look like?
Normal
Uniform
Show answer
Uniform. This gives us another way to think about the earlier questions. Consider a two-sample t-test: if you repeatedly sample from the two populations, where the populations have equal means so that the null of the t-test is true, run a t-test each time, and save the p-values, a histogram of the p-values will look flat and uniform. Therefore, the probability of obtaining a p-value in any particular range is equal to the probability that a uniform random number between 0 and 1 will be in that range, which is equal to the length of the range.
Question 4: What is the left-sided p-value if the randomization distribution is (10,11,12,14,16,19) and the observed value of the test statistic is 11?
Show answer
1/3
Error types and independence
![](https://www.google.com/images/icons/product/drive-32.png)
Question 5: Are t-tests robust to the independence assumption?
Show answer
No. If the independence assumption is not met, we need to use a different method that explicitly takes into account the type of non-independence, such as clustering or correlations across space or time.
Don't miss this text box!
Below you can download the code file that runs the simulations shown in the rest of the lecture. You do NOT need to run the code while watching these videos, and in fact I don't recommend that you download the code until you are done with this module. However, the "starting example" at the top of this code file is what you will need to use as a template if you are asked to write code related to this module.
Normality 1
![](https://www.google.com/images/icons/product/drive-32.png)
Question 6: If the test is working correctly, what proportion of simulated data sets where the null hypothesis is true should lead to p-values less than 0.05?
Show answer
0.05. Have you noticed that we are repeating this point a lot? If you have a solid understanding of what a histogram of p-values will look like in a simulation where the null hypothesis is true and assumption are met, you will be able to understand the content where the assumptions of the tests are broken and/or the null hypotheses are false.
Normality 2
![](https://www.google.com/images/icons/product/drive-32.png)
Question 7: What should we do to check whether the normality assumption is true?
Visualize the data, such as with a histogram or box plot
Conduct a hypothesis test to check for normality
Show answer
Visualize the data. Especially for an assumption that doesn't turn out to be that important, making a decision about normality based on an arbitrary p-value cutoff is hard to justify.
Equal population variances
![](https://www.google.com/images/icons/product/drive-32.png)
Question 8: If two distributions appear to have different variances, which two-sample t-test is appropriate?
Show answer
Unpooled, "Welch" t-test
Outliers
![](https://www.google.com/images/icons/product/drive-32.png)
Question 9: If you see extreme outliers, should you run a t-test?
Show answer
No. Note that t-tests are sensitive to outliers, because the t-statistic is a function of means. If you can't find a reason to omit the outliers, the rank sum is a resistant alternative to the t-test.
Small samples
![](https://www.google.com/images/icons/product/drive-32.png)
The previous slide should have specified that the means of those two population distributions were the same.
Question 10: Which situations might lead you to consider using a non-parametric test instead of a t-test? Select all that apply.
small sample sizes
very slightly non-normal distributions
very non-normal distributions
interested in whether distributions are the same, not just whether means are the same
the equal variances assumption doesn't appear to be true
Show answers
Options 1, 3, and 4. The t-test is quite robust to deviations from normality, so very slightly non-normal distributions are fine. If the variances are not equal, you can use the unpooled t-test.
Non-parametric tests
![](https://www.google.com/images/icons/product/drive-32.png)
Question 11: Why does the rank sum test lead to p-values less than 5% more than 5% of the time when the populations have equal means but different variances?
The assumptions aren't true
The null hypothesis is false
Show answer
The null hypothesis is false. The null hypothesis of a t-test is that the means are equal, but the null hypothesis of a non-parametric test like the rank sum is that the distributions are the same, not just the means.
Question 12: If we repeated the previous simulation but also shifted one of the population distributions so that they had different means AND different variances, would you expect the rank sum test to reject the null more or less often than 16% of the time?
Show answer
More. The power of a test depends on what is actually true. A test is more likely to correctly reject the null hypothesis if the truth is further from the null, and two distributions that differ in both mean and variance are more different than two distributions that differ only in terms of variance.
Power
![](https://www.google.com/images/icons/product/drive-32.png)
Question 13: Why do you need to specify an alternative hypothesis in order to calculate power?
Show answer
The power is the probability that your test will correctly notice when the null is not true. You are much more likely to notice if the truth is very far from the null than if the truth is close to the null. For example, in a two-sample t-test, the null hypothesis is that the difference in population means is zero. Assume for a moment that the two populations both have variance 1, and consider two samples of size 100. If the true difference is not zero but 400 billion, you will certainly get a tiny p-value and correctly reject the null, right? But if the true difference is .0000001, you will definitely not end up with a data set that allows you to rule out the possibility that the true difference is zero.
You did it!
During this tutorial you learned:
More about p-values
That when the null hypothesis is true, the distribution of p-values is a uniform distribution
A little bit about power and type I error
About robustness of hypothesis tests to their assumptions
The definitions of type I error, type II error, and power
The assumptions of a t-test and how to check whether the assumptions are true
What happens to the distribution of p-values when the null hypothesis is true and each of the t-test assumptions is not met
How the t-test performs in the presence of outliers
What happens to the distribution of p-values when the null hypothesis is not true and how this relates to power
When to choose a non-parametric test or a parametric test such as the t-test
Terms and concepts:
Null hypothesis, uniform distribution, robustness, type I error, type II error, power, resistance, independence, normality, equal variances, outlier, alternative hypothesis