1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is lower or equal to 12.44 minutes. A sample with 150 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently higher?
Let's recall the development to solve the previously numerical example presented.
Null hypothesis (Ho): The mean is lower or equal, i.e., μ ≤ 12.44.
Alternate hypothesis (Ha): Then, the mean is higher, i.e., μ > 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a Right-tailed test should be carried.
Since the sample is larger than 30, then a normal distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value z α.
To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Zobs). These could be done using the following equation:
zobs = (x̄ - μ)/(s/(n^0.5))
This equation will result in the following numbers:
zobs = (13.71 - 12.44)/(2.65/(150^0.5)) = (13.71-12.44)/0.2164 = 5.87
Since Zobs = 5.87 is higher than upper critical value z α = 1.645, then we can reject the Null hypothesis.
One important aspect is that the test of the hypothesis is much more sensible to higher values obtained in the sample since the new critical value z α = 1.645 is lower than for a two-tailed test which was z α/2 = 1.96. The next Figure represents this idea in graphical terms.
The previous solution could be summarized by following Python code. First, let's compute the critical value Zcrit for a specified α.
from scipy.stats import norm
muz = 0
sigmaz = 1
p = 0.95
alfa = 1 - p
pr = 1 - alfa
z = norm.ppf(pr,muz,sigmaz)
print("Given alpha = ",str(round(alfa,2)),", Zcrit = ",round(z,2))
Given alpha = 0.05 , Zcrit = 1.64
The next step is to compute the Zobs, and comparing it with Zcrit, and make a decision
mi = 12.44
H0 = "The value is equal to " + str(mi)
n = 150
xb = 13.71
s = 2.65
sx = s/(n**(0.5))
zobs = (xb - mi)/sx
print("Zobs = ",zobs,"e Zcrit = ",z)
if (zobs > z)|(zobs < z): # Zobs belongs to the critical region
print("Reject H0: ",H0)
else:
print("Do not reject H0: ",H0)
Finally, the next code help to visualize the critical regions of the test hypothesis with Zcrit, and the Zobs, help to compare their values to make a decision.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
x1 = np.arange(z, 10, 0.001) # range of x in spec
x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec
# mean = 0, stddev = 1, since Z-transform was calculated
y1 = norm.pdf(x1,0,1)
y_all = norm.pdf(x_all,0,1)
# build the plot
fig, ax = plt.subplots(figsize=(9,6))
plt.style.use('fivethirtyeight')
ax.plot(x_all,y_all)
ax.text(muz, 0.4, 'Ho', fontsize=14)
ax.text(muz-1.2*z, 0.1, 'Ha', fontsize=14)
ax.text(muz+z, 0.1, 'Ha', fontsize=14)
ax.fill_between(x1,y1,0, alpha=0.7, color='r')
ax.fill_between(x_all,y_all,0, alpha=0.1, color='g')
ax.set_xlim([-6,6])
ax.set_xlabel('# of Standard Deviations Outside the Mean')
ax.set_yticklabels([])
ax.set_title('Normal Gaussian Curve CL = '+str(round(p*100,0))+' %')
# drawing Zobs
ax.axvline(x = zobs, color = 'r', label = 'Zobs')
ax.text(0.85*zobs, 0.1, 'Zobs', fontsize=14)
The previous complete code is available in the following link:
https://colab.research.google.com/drive/16AuQptduGt1wYek25CoU-xtvf5YES6ph?usp=sharing