Intro to 

Non-Parametric Tests

The total length of the videos in this section is approximately 47 minutes, but you will also spend time answering short questions while completing this section.

You can also watch these videos at the playlist linked here.

Defining terms

IntroToNPTests.1.Definitions.mp4

Question 1: Is the estimand a statistic?

Show answer

Not unless you observe the entire target population. An estimand is a quantity that you calculate based on the target population, and a statistic is a quantity that you can calculate based on the observed sample.

Randomization test assuming simple random sample

IntroToNPTests.2.HLABBernoulli.mp4

Question 2: Why do we have to assume that the null hypothesis is true in order to create a reference distribution for the test statistic?

Show answer

We need this assumption in order to calculate what would have happened under other possible randomizations, because we only observe what happens under one possible randomization.

Bernoulli randomization and p-values

IntroNPTests.3.HLAB with Bernoulli and p-values.mp4

Question 3: Bernoulli randomization means that you flip a coin for each unit, rather than drawing a prespecified number of units out of a hat.

For which of the following situations might Bernoulli randomization be appropriate?

Show answer

The rare disease context, but not the evaluation of this course. Bernoulli randomization is useful when units may be arriving one by one so that you cannot randomize them all at once (complete randomization). In order to collect data on several patients with the same rare disease, we would need to wait until enough patients arrived at participating medical clinics, one by one. If units are not arriving one by one, it's usually better to use complete randomization because you can prespecify how many units will end up in each treatment group.

Question 4: The p-value is:

Show answer

The probability that we would see data at least this extreme, assuming that the null hypothesis is true. (There was previously a typo here - this is the right answer!)

Those two quantities are definitely not the same. The p-value is an awkward quantity. "Bayesian statistics" instead gives us the probability that a particular hypothesis is true - this can be more intuitive. But, more statistical background is typically needed to use Bayesian methods. The main thing I want you to know is the answer to the question above: the p-value is the second definition, even though the first definition is easier to understand.

Question 5: What is the basis for using 0.05 as the cutoff for rejecting the null in a hypothesis test, tradition or statistical theory?

Show answer

Tradition. Any cutoff that seems reasonable to you is as well justified as 0.05. With the tea example, the p-value for 4 correct was 1/70, and the p-value for 3 correct was 17/70. Using the arbitrary 0.05 cutoff, we would reject the null if the p-value was less than 3.5/70 = 0.05.

Specify which p-value you are reporting

IntroToNPTests.4.Pval.mp4

Question 6: If the one-sided p-value is 0.04, what is the two-sided p-value?

Show answer

By definition, the two-sided p-value is twice the one-sided p-value. In this case, 0.04*2 = 0.08. Note that if we were planning to make a decision based on a strict 0.05 cutoff, it matters whether we are using the one-sided v. two-sided p-value, and we should specify ahead of time whether 0.05 is the cutoff for a one-sided or two-sided p-value. However, it's better not to make strict decisions based on p-value cutoffs. Instead, we should report the p-value and also all of the other information we have: visualizations of the data, the means in each group we are comparing, and other summaries.

Question 7: Are you surprised when something happens that should happen 4.3% of the time?

Show answer

It's up to you!

Non-parametric tests: next steps

IntroNPTests.5.NextSteps.mp4

Question 8: Can we carry out the steps of a randomization test if the study was not actually randomized?

Show answer

Yes. The steps in the test work perfectly will if the two groups were not created randomly. However, in that case we can't justify the test by saying that each of these other randomizations could have occurred if the groups did not cause the outcome. Instead, we justify the test by saying that the group labels could have been allocated in any of these ways if the outcomes are not related to the groups.

Application to HLAB

Below is the reference distribution for the actual HLAB data set, which included 207 people who were offered HLAB assistance or not via Bernoulli randomization.

The right-sided p-value is 0.26. We cannot rule out the possibility that the 4 percentage point difference in win rates was due to chance. In fact, a 4 percentage point difference is typical of what we'd expect to see in the data if offering HLAB assistance actually has no impact on win outcomes.

(You might notice that the title refers to "weighted" win rates. This is because not all of the coins we flipped had 50-50 probabilities, as HLAB was more available to offer assistance at certain times. The varying probabilities were taken into account in both the test statistic and the reference distribution.)

You did it!

During this tutorial you learned:


Terms and concepts:

hypothesis test, null hypothesis, statistic, test-statistic, distribution, reference distribution, p-value, two-sided p-value, one-sided p-value, left-sided p-value, right-sided p-value, parametric test, non-parametric test, randomization test, permutation test