A hypothesis is a statement of theory. It may or may not be true. We should be able to test the hypothesis either by experiment or by theory. For example:
A new medicine you think might work.
A possible location of new species.
The hypothesis is generally a starting point. From there, we have to test the hypothesis and reach a decision if the hypothesis is probably true or probably false. Note the word 'probably' - there is always a chance for you to make the wrong decision. There are five steps involved in conducting a hypothesis test( using p-value approach):
State the Null hypothesis H0 and the Alternate hypothesis H1
Choose the level of significance α and the sample size n. The level of significance depends on the relative importance of the risks of committing Type I and Type II errors in the problem.
Determine the appropriate test statistic and the sampling distribution
Collect the sample data, compute the value of the test statistic and compute the p-value.
Compare the p-value to the chosen significance level. If the p-value is greater than or equal to α, do not reject the null hypothesis. If p-value is less than α, reject the null hypothesis.
We shall go through the each of these steps in detail, but before these we need to review the standard normal distribution.
The standard normal distribution is a special case of the normal distribution. It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z-score. Every normal random variable X can be transformed into a z score via the following equation:
z = (X - μ) / σ
where X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X.
For this distribution, the area under the curve from -∞ to +∞ is equal to 1.0. In addition, the area under the curve is proportional to the fraction of measurements that fall in that region. These two facts can used to help determine the fraction of measurements that fall above some value (such as a threshold limit), below some value, or between two values.
The x-axis contains the z values.
When the standard deviation(s.d.) is known(rarely occurs), we can use the z-test for the mean if the population is normally distributed. Even if the population is not normally distributed, we can still use z-test if the sample size is large enough for the Central Limit Theorem to take place.
Hence,
Z-stat = (x̅ -μ) / (σ/√n )
Now lets solve a problem where we will go through each of the five steps in much detail.
You are the manager of a fast food restaurant. The business problem is to determine whether the population mean waiting time to place an order has changed in the past month from its previous value of 5.0 minutes. from past experience, you can assume that the population s.d. is 0.2 minutes and population wait time is normally distributed. You select a sample of 25 orders during the one-hour period. The sample mean is 5.06 minutes. Use the p-value approach to determine whether there is evidence that the population mean waiting time to place an has changed in the past month from its previous population mean value of 4.5 minutes.
Step 1: State the Null hypothesis H0 and the Alternate hypothesis H1
The null hypothesis, which is denoted by H0 is set up to assume that nothing changes – that the status quo holds. In this case, the population mean has not changed from its previous value of 4.5 minutes.
H0 : σ = 5.0
The alternate hypothesis is the opposite of null hypothesis. Hence the alternate hypothesis is population mean is not 4.5 minutes.
H1 : σ ≠ 5.0
Step 2: Choose the level of significance α and the sample size n
The sample size chosen is, n = 25 and the level of significance, α = 0.05
Step 3: Determine the appropriate test statistic and the sampling distribution
Since σ is assumed known, we use the normal distribution and the Z-test test statistic.
Step 4: Collect the sample data, compute the value of the test statistic and compute the p-value
The formula to compute Z-stat is given above, we use the same formulae here.
Z-stat = (x̅ -μ) / (σ/√n ) = (5.06-5.0) / (0.2 / √25) = 1.50
Hence, we determine the sample mean 5.06 is 1.5 standard deviations away from the average. The probability of getting a result that is 2.5 standard deviation away from the average is 0.0668 . Since , this a two-side test, so we don't care if the difference was above or below the average. So, the probability of getting an average that is more than 2.5 standard deviations away from the average is 2(0.0668) = 0.1336
We use the z-score table to determine the p-value = 0.0668 corresponding to Z-stat = 1.5.
Step 5: Compare the p-value to the chosen significance level
Because the p-value = 0.1336 > α = 0.05, we do not reject(fail to reject) the null hypothesis. We can now conclude that the population mean waiting time in placing an order has not changed from its previous population mean value of 5.0 minutes. We conclude that we do not have sufficient evidence, based on the random sample we have chosen that the time required to place an order has changed.
So, what does the p-value determine and why do we either reject or do not reject the null hypothesis based on this value?
The p-value measures the probability of the sample statistic occurring given the null hypothesis is true.
In other words with respect to the above problem, the p- value is represented by P(x̅ = 5.06 | H0 true) = 0.0668
Now, since this problem was a two-tailed test, the p-value is applicable to the both ends of the standard normal distribution. Hence the total p-value calculated was 2(0.0668) = 0.1336 . In other words, each p-value of 0.0668 corresponds to each of the red zones in the above graph.
Now, since 0.1336 > 0.05, this means that x̅ = 5.06 can happen when H0 true. Hence, we fail to reject the Null Hypothesis.