Intro to
Non-Parametric Tests
The total length of the videos in this section is approximately 47 minutes, but you will also spend time answering short questions while completing this section.
You can also watch these videos at the playlist linked here.
Defining terms
Question 1: Is the estimand a statistic?
Show answer
Not unless you observe the entire target population. An estimand is a quantity that you calculate based on the target population, and a statistic is a quantity that you can calculate based on the observed sample.
Randomization test assuming simple random sample
Question 2: Why do we have to assume that the null hypothesis is true in order to create a reference distribution for the test statistic?
Show answer
We need this assumption in order to calculate what would have happened under other possible randomizations, because we only observe what happens under one possible randomization.
Bernoulli randomization and p-values
Question 3: Bernoulli randomization means that you flip a coin for each unit, rather than drawing a prespecified number of units out of a hat.
For which of the following situations might Bernoulli randomization be appropriate?
Doctors are studying a rare disease
I compare two ways to convey material in this course
Show answer
The rare disease context, but not the evaluation of this course. Bernoulli randomization is useful when units may be arriving one by one so that you cannot randomize them all at once (complete randomization). In order to collect data on several patients with the same rare disease, we would need to wait until enough patients arrived at participating medical clinics, one by one. If units are not arriving one by one, it's usually better to use complete randomization because you can prespecify how many units will end up in each treatment group.
Question 4: The p-value is:
the probability that the null hypothesis is true
the probability that we would see data at least this extreme, assuming that the null hypothesis is true
both of the above - they are the same
Show answer
The probability that we would see data at least this extreme, assuming that the null hypothesis is true. (There was previously a typo here - this is the right answer!)
Those two quantities are definitely not the same. The p-value is an awkward quantity. "Bayesian statistics" instead gives us the probability that a particular hypothesis is true - this can be more intuitive. But, more statistical background is typically needed to use Bayesian methods. The main thing I want you to know is the answer to the question above: the p-value is the second definition, even though the first definition is easier to understand.
Question 5: What is the basis for using 0.05 as the cutoff for rejecting the null in a hypothesis test, tradition or statistical theory?
Show answer
Tradition. Any cutoff that seems reasonable to you is as well justified as 0.05. With the tea example, the p-value for 4 correct was 1/70, and the p-value for 3 correct was 17/70. Using the arbitrary 0.05 cutoff, we would reject the null if the p-value was less than 3.5/70 = 0.05.
Specify which p-value you are reporting
Question 6: If the one-sided p-value is 0.04, what is the two-sided p-value?
Show answer
By definition, the two-sided p-value is twice the one-sided p-value. In this case, 0.04*2 = 0.08. Note that if we were planning to make a decision based on a strict 0.05 cutoff, it matters whether we are using the one-sided v. two-sided p-value, and we should specify ahead of time whether 0.05 is the cutoff for a one-sided or two-sided p-value. However, it's better not to make strict decisions based on p-value cutoffs. Instead, we should report the p-value and also all of the other information we have: visualizations of the data, the means in each group we are comparing, and other summaries.
Question 7: Are you surprised when something happens that should happen 4.3% of the time?
Show answer
It's up to you!
Non-parametric tests: next steps
Question 8: Can we carry out the steps of a randomization test if the study was not actually randomized?
Show answer
Yes. The steps in the test work perfectly will if the two groups were not created randomly. However, in that case we can't justify the test by saying that each of these other randomizations could have occurred if the groups did not cause the outcome. Instead, we justify the test by saying that the group labels could have been allocated in any of these ways if the outcomes are not related to the groups.
Application to HLAB
Below is the reference distribution for the actual HLAB data set, which included 207 people who were offered HLAB assistance or not via Bernoulli randomization.
The right-sided p-value is 0.26. We cannot rule out the possibility that the 4 percentage point difference in win rates was due to chance. In fact, a 4 percentage point difference is typical of what we'd expect to see in the data if offering HLAB assistance actually has no impact on win outcomes.
(You might notice that the title refers to "weighted" win rates. This is because not all of the coins we flipped had 50-50 probabilities, as HLAB was more available to offer assistance at certain times. The varying probabilities were taken into account in both the test statistic and the reference distribution.)
You did it!
During this tutorial you learned:
The definition of hypothesis tests, a null hypothesis, a statistic, and test-statistic
About distributions and how to create a non-parametric reference distribution
How to perform a hypothesis test with a randomization test that assumes simple random sampling or Bernoulli sampling (HLAB example)
The definition of a p-value, including two-sided, one-sided, left-sided, right-sided p-values
What to conclude when a p-value is small or large, and about the conventional cutoff value
The importance of specifying the type of p-value when reporting your test results
About the benefits of non-parametric tests
Terms and concepts:
hypothesis test, null hypothesis, statistic, test-statistic, distribution, reference distribution, p-value, two-sided p-value, one-sided p-value, left-sided p-value, right-sided p-value, parametric test, non-parametric test, randomization test, permutation test