1.2. Parametric tests for Hypothesis Testing

The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is different from 12.44 minutes. A sample with 150 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently different?

Before trying to solve this problem is interesting to have a close look at the steps defined by the flowchart to create a test hypothesis.

Numerical example and its solution summary

Let's recall the development to solve the previously numerical example presented.

Null hypothesis (Ho): The mean had been not affected, i.e., μ = 12.44.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ ≠ 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a two-tailed test should be carried.
Since the sample is larger than 30, then a normal distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value z α/2.

To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Zobs). These could be done using the following equation:

zobs = (x̄ - μ)/(s/(n^0.5))

This equation will result in the following numbers:

zobs = (13.71 - 12.44)/(2.65/(150^0.5)) = (13.71-12.44)/0.2164 = 5.87

Since Zobs = 5.87 is higher than upper critical value z α/2 = 1.96, then we can reject the Null hypothesis.

Significance Level and P-Value

For a more detailed information, please see content at Track 08 - Section 1.7

https://sites.google.com/view/statistics-on-customs/in%C3%ADcio/track08/p-value-and-test-of-hypothesis

Recapitulating the concept of significance level

On the probability distribution plot, the significance level defines how far the sample value must be from the null value before we can reject the null. The percentage of the area under the curve that is shaded equals the probability that the sample value will fall in those regions if the null hypothesis is correct. To represent a significance level of 0.05, the next figure shade in a red color α = 5% of the distribution furthest from the null value [1].

What is a P-value?

The first interpretation is P-values indicate that the null hypothesis is correct given the sample data. They gauge how consistent your sample statistics are with the null hypothesis. Specifically, if the null hypothesis is right, what is the probability of obtaining an effect at least as large as the one in your sample [1]?

High P-values: Your sample results are consistent with a true null hypothesis.
Low P-values: Your sample results are not consistent with a null hypothesis.

If your P-value is small enough, you can conclude that your sample is so incompatible with the null hypothesis that you can reject the null for the entire population. As a more clear decision rule:

If P-value ≥ α then cannot reject the null hypothesis
Else reject the null hypothesis

In a second interpretation, P-values tell you how consistent your sample data are with a true null hypothesis.

Suppose the hypothesis test generates a P-value of 0.03. You’d interpret this P-value as follows: although the null hypothesis holds on the population as a whole, 3% of individuals will obtain the effect observed in your sample, or larger, because of random sample error[2].

However, when your data are very inconsistent with the null hypothesis, P values can’t determine which of the following two possibilities is more probable:

The null hypothesis is true, but your sample is unusual due to random sampling error.
The null hypothesis is false.

This is the main reason why it is said that: “failed to reject the null hypothesis” or "cannot reject the null hypothesis" rather than accepted it.

Python code for using Zcrit and Zobs

First, let's compute Zcrit for the confidence level equal to 1%.

from scipy.stats import norm

muz = 0

sigmaz = 1

p = 0.99

alfa = 1 - p

pr = 1 - alfa/2

z = norm.ppf(pr,muz,sigmaz)

print("Given alpha = ",str(round(alfa,2)),", Zcrit = ",round(z,2))

Given alpha = 0.01 , Zcrit = 2.58

Now, use the sample data to compute Zobs.

mi = 12.44

H0 = "The value is equal to " + str(mi)

n = 150

xb = 12.98

s = 2.65

sx = s/(n**(0.5))

zobs = (xb - mi)/sx

print("Zobs = ",zobs,"and Zcrit = ",z)

if (zobs > z)|(-zobs < -z): # Zobs belongs to the critical region

print("Reject H0: ",H0)

else:

print("Do not reject H0: ",H0)

Zobs = 2.495706530382865 and Zcrit = 2.5758293035489004

Do not Reject H0: The value is equal to 12.44

To better visualize this situation, let's draw it.

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import norm

x1 = np.arange(-z, z, 0.001) # range of x in spec

x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec

# mean = 0, stddev = 1, since Z-transform was calculated

y1 = norm.pdf(x1,0,1)

y_all = norm.pdf(x_all,0,1)

# build the plot

fig, ax = plt.subplots(figsize=(9,6))

plt.style.use('fivethirtyeight')

ax.plot(x_all,y_all)

ax.text(muz, 0.4, 'Ho', fontsize=14)

ax.text(muz-1.2*z, 0.1, 'Ha', fontsize=14)

ax.text(muz+z, 0.1, 'Ha', fontsize=14)

ax.fill_between(x1,y1,0, alpha=0.7, color='g')

ax.fill_between(x_all,y_all,0, alpha=0.1, color='r')

ax.set_xlim([-6,6])

ax.set_xlabel('# of Standard Deviations Outside the Mean')

ax.set_yticklabels([])

ax.set_title('Normal Gaussian Curve CL = '+str(round(p*100,0))+' %')

# drawing Zobs

ax.axvline(x = zobs, color = 'r', label = 'Zobs')

ax.text(0.85*zobs, 0.2, 'Zobs', fontsize=14)

print("Zobs: ",zobs)

Python code for using P-Value and α

Let's first compute P-Value and use the chosen confidence level to make a decision.

muz = 0

sigmaz = 1

P = norm.cdf(zobs,muz,sigmaz)

Pvalue = 2*(1-P)

print("Pvalue = ",Pvalue)

print("alpha = ",alfa)

if (Pvalue >= alfa):

print("We cannot reject H0")

else:

print("Reject H0")

Pvalue = 0.012570655321079371

alpha = 0.010000000000000009

We cannot reject H0

Finally, let's draw in a graphic the P-value metric.

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import norm

x1 = np.arange(-z, z, 0.001) # range of x in spec

x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec

# mean = 0, stddev = 1, since Z-transform was calculated

y1 = norm.pdf(x1,0,1)

y_all = norm.pdf(x_all,0,1)

# build the plot

fig, ax = plt.subplots(figsize=(9,6))

plt.style.use('fivethirtyeight')

ax.plot(x_all,y_all)

ax.text(muz, 0.4, 'Ho', fontsize=14)

stralfa = str(round(alfa*100,2))

ax.text(muz-1.2*z, 0.1, 'Ha|alpha = ' + stralfa, fontsize=14)

ax.text(muz+z, 0.1, 'Ha|alpha = ' + stralfa, fontsize=14)

ax.fill_between(x1,y1,0, alpha=0.7, color='g')

ax.fill_between(x_all,y_all,0, alpha=0.1, color='r')

ax.set_xlim([-6,6])

ax.set_xlabel('# of Standard Deviations Outside the Mean')

ax.set_yticklabels([])

ax.set_title('Normal Gaussian Curve CL = '+str(round(p*100,0))+' %')

# drawing Zobs

ax.axvline(x = zobs, color = 'r', label = 'P-Value')

ax.text(0.85*zobs, 0.2, 'P-value = '+str(round(Pvalue*100,2)), fontsize=14)

print('P-value',Pvalue)

P-value 0.012570655321079371

References

[1] https://statisticsbyjim.com/hypothesis-testing/hypothesis-tests-significance-levels-alpha-p-values/

[2] https://statisticsbyjim.com/hypothesis-testing/interpreting-p-values/

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1P7DRHjbNfrVRrJqe2RL4w4uuvfgW5Osm?usp=sharing

Page updated

Google Sites

Report abuse