The Steps of NHST

The Steps of NHST

In this section, our focus is on null hypothesis significance testing, which is part of inference. Previously we introduced and practiced stating null and alternative hypotheses from a research question. Forming the hypotheses is the first step in a hypothesis test. Here are the general steps in the process of hypothesis testing.

Step 1: Determine the hypotheses.

The hypotheses come from the research question and is stated in terms of population parameters. The null and alternative hypotheses are stated to be contradictory, mutually exclusive and exhaustive

Step 2: State your decision criterion (α).

Because the hypothesis test is based on probability, we need to state the level of acceptable type I error. This is usually set to 5% by tradition. This stated decision criterion is what we compare our test statistic (p-value from step 4) to in order to make a decision to reject or fail to reject the null hypothesis (step 5)

Step 3: Collect the data.

Ideally, we ethically select a random sample from the population. The data comes from this sample.

Step 4: Assess the evidence by computing statistic(s).

Assume that the null hypothesis is true. Could the data come from the population described by the null hypothesis? Use simulation or a mathematical model to examine the results from random samples selected from the population described by the null hypothesis. Figure out if results similar to the data are likely or unlikely. Note that the wording “likely or unlikely” implies that this step requires some kind of probability calculation. This will be your computed p-value for your observed data from step 3.

Step 5: State a conclusion.

We use what we find in the previous step to make a decision. This step requires us to think in the following way. Remember that we assume that the null hypothesis is true. Then one of two outcomes can occur:

  • One possibility is that results similar to the actual sample are extremely unlikely. This means that the data do not fit in with results from random samples selected from the population described by the null hypothesis. In this case, it is unlikely that the data came from this population, so we view this as strong evidence against the null hypothesis. Technically, if our computed p-value from step 4 is less than our stated alpha value from step 2, then we reject the null hypothesis in favor of the alternative hypothesis.

  • The other possibility is that results similar to the actual sample are fairly likely (not unusual). This means that the data fit in with typical results from random samples selected from the population described by the null hypothesis. Technically, if our computed p-value from step 4 is greater than our stated alpha value in step 2, then we fail to reject the null hypothesis. In this case, we do not have evidence against the null hypothesis, so we cannot reject it in favor of the alternative hypothesis.


__________________________________________

EXAMPLE 1

According to an article by Andrew Berg (“Report: Teens Texting More, Using More Data,” Wireless Week, October 15, 2010), Nielsen Company analyzed cell phone usage for different age groups using cell phone bills and surveys. Nielsen found significant growth in data usage, particularly among teens, stating that “94 percent of teen subscribers self-identify as advanced data users, turning to their cellphones for messaging, Internet, multimedia, gaming, and other activities like downloads.” The study found that the mean cell phone data usage was 62 MB among teens ages 13 to 17. A researcher is curious whether cell phone data usage has increased for this age group since the original study was conducted. She plans to conduct a hypothesis test.

Step 1: Determine the hypotheses.

The null hypothesis is often a statement of “no change,” so the null hypothesis will state that there is no change in the mean cell phone data usage for this age group since the original study. In this case, the alternative hypothesis is that the mean has increased from 62 MB.

  • H0: The mean data usage for teens with smart phones is still 62 MB (H0: μ ≤ 62).

  • Ha: The mean data usage for teens with smart phones is greater than 62 MB (Ha: μ > 62).

Step 2: State the decision criterion.

The next step is state your decision criterion. We will use an alpha of .05 (due to tradition). This means that the data we observe must be less than 5% probable (pretty unlikely) to occur if the null is true in order for us to reject the null hypothesis. Notice this also means we are willing to accept a 5% Type I error rate (i.e., we are willing to be wrong 5% of the time if the null hypothesis is true).

Step 3: Collect the data.

The next step is to obtain a sample and collect data that will allow the researcher to test the hypotheses. The sample must be representative of the population and, ideally, should be a random sample. In this case, the researcher must randomly sample teens who use smart phones.

For the purposes of this example, imagine that the researcher randomly samples 50 teens who use smart phones. She finds that the mean data usage for these teens was 75 MB with a standard deviation of 45 MB. Since it is greater than 62 MB, this sample mean provides some evidence in favor of the alternative hypothesis. But the researcher anticipates that samples will vary when the null hypothesis is true. So how much of a difference will make her doubt the null hypothesis? Does she have evidence strong enough to reject the null hypothesis?

Step 4: Assess the evidence.

To assess the evidence, the researcher needs to know how much variability to expect in random samples when the null hypothesis is true. She begins with the assumption that H0 is true – in this case, that the mean data usage for teens is still 62 MB. She then determines how unusual the results of the sample are: If the mean for all teens with smart phones actually is 62 MB, what is the chance that a random sample of 50 teens will have a sample mean of 75 MB or higher? Obviously, this probability depends on how much variability there is in random samples of this size from this population.

The probability of observing a sample mean at least this high if the population mean is 62 MB is approximately 0.023 (later topics explain how to calculate this probability). The probability is quite small. It tells the researcher that if the population mean is actually 62 MB, a sample mean of 75 MB or higher will occur only about 2.3% of the time. This probability is called the P-value.

Note: The P-value is a conditional probability. The condition is the assumption that the null hypothesis is true. Thus, the p-value tells us the probability that the data would have occurred if the null hypothesis is true.

Step 5: Conclusion.

The small P-value indicates that it is unlikely for a sample mean to be 75 MB or higher if the population has a mean of 62 MB. It is therefore unlikely that the data from these 50 teens came from a population with a mean of 62 MB. The evidence is strong enough to make the researcher doubt the null hypothesis, so she rejects the null hypothesis in favor of the alternative hypothesis. Technically, since the computed p-value of .023 (step 4) was less than the stated decision criterion (step 2) of .05, the researcher will reject the null hypothesis. The researcher concludes that the mean data usage for teens with smart phones has increased since the original study. It is now greater than 62 MB. (P = 0.023)

__________________________________________

Comment

Notice that the P-value is included in the preceding conclusion, which is a common practice. It allows the reader to see the strength of the evidence used to draw the conclusion.

How Small Does the P-Value Have to Be to Reject the Null Hypothesis?

A small P-value indicates that it is unlikely that the actual sample data came from the population described by the null hypothesis. More specifically, a small P-value says that there is only a small chance that we will randomly select a sample with results at least as extreme as the data if H0 is true. The smaller the P-value, the stronger the evidence against H0.

But how small does the P-value have to be in order to reject H0?

In practice, we often compare the P-value to 0.05. We reject the null hypothesis in favor of the alternative if the P-value is less than (or equal to) 0.05.

Note: This means that sampling variability will produce results at least as extreme as the data 5% of the time. In other words, in the long run, 1 in 20 random samples will have results that suggest we should reject H0 even when H0 is true. This variability is just due to chance, but it is unusual enough that we are willing to say that results this rare suggest that H0 is not true.

Statistical Significance: Another Way to Describe Unlikely Results

When the P-value is less than (or equal to) 0.05, we also say that the difference between the actual sample statistic and the assumed parameter value is statistically significant. In the previous example, the P-value is less than 0.05, so we say the difference between the sample mean (75 MB) and the assumed mean from the null hypothesis (62 MB) is statistically significant. A statistically significant finding is one where we have rejected the null hypothesis. However, this does NOT imply that when we fail to reject the null hypothesis that we have insignificant findings. Findings that are not statistically significant can be very important! It is important to know if a drug or therapy does not work (i.e., is not statistically significant), and is not an insignificant finding!

Other Observations about Stating Conclusions in a Hypothesis Test

In the example, the sample mean was greater than 62 MB. This fact alone does not suggest that the data supports the alternative hypothesis. We have to determine that the data is not only larger than 62 MB but larger than we would expect to see in a random sampling if the population mean is 62 MB. We therefore need to determine the P-value. If the sample mean was less than or equal to 62 MB, it would not support the alternative hypothesis, the conclusion is clear.

We have to be very careful in how we state the conclusion. There are only two possibilities.

  • We have enough evidence to reject the null hypothesis and support the alternative hypothesis.

  • We do not have enough evidence to reject the null hypothesis, so there is not enough evidence to support the alternative hypothesis.

If the P-value in the previous example was greater than 0.05, then we would not have enough evidence to reject H0 and accept Ha. In this case our conclusion would be that “there is not enough evidence to show that the mean amount of data used by teens with smart phones has increased.” Notice that this conclusion answers the original research question. It focuses on the alternative hypothesis. It does not say “the null hypothesis is true.” We never accept the null hypothesis or state that it is true. When there is not enough evidence to reject H0, the conclusion will say, in essence, that “there is not enough evidence to support Ha.” But of course we will state the conclusion in the specific context of the situation we are investigating.

We compared the P-value to 0.05 in the previous example. The number 0.05 is called the significance level for the test, because a P-value less than or equal to 0.05 is statistically significant (unlikely to have occurred solely by chance). The symbol we use for the significance level is α (the lowercase Greek letter alpha). We sometimes refer to the significance level as the α-level. We call this value the significance level because if the P-value is less than the significance level, we say the results of the test showed a significance difference.

If the P-value ≤ α, we reject the null hypothesis in favor of the alternative hypothesis.

If the P-value > α, we fail to reject the null hypothesis.

In practice, it is common to see 0.05 for the significance level. Occasionally, researchers use other significance levels. In particular, if rejecting H0 will be controversial or expensive, we may require stronger evidence. In this case, a smaller significance level, such as 0.01, is used. As with the hypotheses, we should choose the significance level before collecting data. It is treated as an agreed-upon benchmark prior to conducting the hypothesis test. In this way, we can avoid arguments about the strength of the data.

Let’s look at some exercises that focus on the P-value and its meaning to conclude. Then, we will use these NHST steps with different normal and discrete distributions in the next chapters.


__________________________________________

EXAMPLE 2

For many years, working full-time has meant working 40 hours per week. Nowadays, it seems that corporate employers expect their employees to work more than this amount. A researcher decides to investigate this hypothesis.

  • H0: The average time full-time corporate employees work per week is 40 hours.

  • Ha: The average time full-time corporate employees work per week is more than 40 hours.

To substantiate his claim, the researcher randomly selects 250 corporate employees and finds that they work an average of 47 hours per week with a standard deviation of 3.2 hours.

In order to assess the evidence, we need to ask: How likely is it that in a sample of 250 we will find that the mean number of hours per week corporate employees work is as high as 47 if the true mean is 40?

__________________________________________

EXAMPLE 3

According to the Centers for Disease Control (CDC), roughly 21.5% of all high school seniors in the United States have used marijuana. (The data were collected in 2002. The figure represents those who smoked during the month prior to the survey, so the actual figure might be higher.) A sociologist suspects that the rate among African American high school seniors is lower. In this case, then,

  • H0: The rate of African American high-school seniors who have used marijuana is 21.5% (same as the overall rate of seniors).

  • Ha: The rate of African American high-school seniors who have used marijuana is lower than 21.5%.

To check his claim, the sociologist chooses a random sample of 375 African American high school seniors, and finds that 16.5% of them have used marijuana.

In order to assess this evidence, we need to find: How likely in a sample of 375 we'll find that as low as 16.5% have used marijuana, when the true rate is really 21.5%?

__________________________________________

In this section, our focus was on the steps of null hypothesis significance testing. To conclude with the formal steps:

Step 1: Determine the hypotheses.

The hypotheses come from the research question and is stated in terms of population parameters. The null and alternative hypotheses are stated to be contradictory, mutually exclusive and exhaustive

Step 2: State your decision criterion (α).

Because the hypothesis test is based on probability, we need to state the level of acceptable type I error. This is usually set to 5% by tradition. This stated decision criterion is what we compare our test statistic (p-value from step 4) to in order to make a decision to reject or fail to reject the null hypothesis (step 5)

Step 3: Collect the data.

Ideally, we ethically select a random sample from the population. The data comes from this sample.

Step 4: Assess the evidence by computing statistic(s).

Assume that the null hypothesis is true. Could the data come from the population described by the null hypothesis? Use simulation or a mathematical model to examine the results from random samples selected from the population described by the null hypothesis. Figure out if results similar to the data are likely or unlikely. Note that the wording “likely or unlikely” implies that this step requires some kind of probability calculation. This will be your computed p-value for your observed data from step 3.

Step 5: State a conclusion.

We use what we find in the previous step to make a decision. This step requires us to think in the following way. Remember that we assume that the null hypothesis is true. Then one of two outcomes can occur:

  • One possibility is that results similar to the actual sample are extremely unlikely. This means that the data do not fit in with results from random samples selected from the population described by the null hypothesis. In this case, it is unlikely that the data came from this population, so we view this as strong evidence against the null hypothesis. Technically, if our computed p-value from step 4 is less than our stated alpha value from step 2, then we reject the null hypothesis in favor of the alternative hypothesis.

  • The other possibility is that results similar to the actual sample are fairly likely (not unusual). This means that the data fit in with typical results from random samples selected from the population described by the null hypothesis. Technically, if our computed p-value from step 4 is greater than our stated alpha value in step 2, then we fail to reject the null hypothsis. In this case, we do not have evidence against the null hypothesis, so we cannot reject it in favor of the alternative hypothesis.