1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is different from 12.44 minutes. A sample with 150 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently different?
Let's recall the development to solve the previously numerical example presented.
Null hypothesis (Ho): The mean had been not affected, i.e., μ = 12.44.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ ≠ 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a two-tailed test should be carried.
Since the sample is larger than 30, then a normal distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value z α/2.
To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Zobs). These could be done using the following equation:
zobs = (x̄ - μ)/(s/(n^0.5))
This equation will result in the following numbers:
zobs = (13.71 - 12.44)/(2.65/(150^0.5)) = (13.71-12.44)/0.2164 = 5.87
Since Zobs = 5.87 is higher than upper critical value z α/2 = 1.96, then we can reject the Null hypothesis.
The previous solution could be summarized by following Python code. First, let's compute the critical value Zcrit for a specified α.
from scipy.stats import norm
muz = 0
sigmaz = 1
p = 0.95
alfa = 1 - p
pr = 1 - alfa/2
z = norm.ppf(pr,muz,sigmaz)
print("Given alpha = ",str(round(alfa,2)),", Zcrit = ",round(z,2))
Given alpha = 0.05 , Zcrit = 1.96
The next step is to compute the Zobs, and comparing it with Zcrit, and make a decision
mi = 12.44
H0 = "The value is equal to " + str(mi)
n = 150
xb = 13.71
s = 2.65
sx = s/(n**(0.5))
zobs = (xb - mi)/sx
print("Zobs = ",zobs,"e Zcrit = ",z)
if (zobs > z)|(zobs < z): # Zobs belongs to the critical region
print("Reject H0: ",H0)
else:
print("Do not reject H0: ",H0)
Zobs = 5.869532025159697 e Zcrit = 1.959963984540054
Reject H0: The value is equal to 12.44
Finally, the next code help to visualize the critical regions of the test hypothesis with Zcrit, and the Zobs, help to compare their values to make a decision.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
x1 = np.arange(-z, z, 0.001) # range of x in spec
x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec
# mean = 0, stddev = 1, since Z-transform was calculated
y1 = norm.pdf(x1,0,1)
y_all = norm.pdf(x_all,0,1)
# build the plot
fig, ax = plt.subplots(figsize=(9,6))
plt.style.use('fivethirtyeight')
ax.plot(x_all,y_all)
ax.text(muz, 0.4, 'Ho', fontsize=14)
ax.text(muz-1.2*z, 0.1, 'Ha', fontsize=14)
ax.text(muz+z, 0.1, 'Ha', fontsize=14)
ax.fill_between(x1,y1,0, alpha=0.7, color='g')
ax.fill_between(x_all,y_all,0, alpha=0.1, color='r')
ax.set_xlim([-6,6])
ax.set_xlabel('# of Standard Deviations Outside the Mean')
ax.set_yticklabels([])
ax.set_title('Normal Gaussian Curve CL = '+str(round(p*100,0))+' %')
# drawing Zobs
ax.axvline(x = zobs, color = 'r', label = 'Zobs')
ax.text(0.85*zobs, 0.1, 'Zobs', fontsize=14)
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1uvdpz4bpFLh__KumGAXMpNg-x5wcfdKn?usp=sharing