1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
The average service time of a company in 2018 was 12.44 minutes. Management wants to know whether the arithmetic mean current is different from 12.44 minutes. A sample with 25 values had an arithmetic mean of 13.71 minutes and a standard deviation of 2.65 minutes. Using α = 5%, can you conclude whether the time is currently different?
Let's recall the development to solve the previously numerical example presented.
Null hypothesis (Ho): The mean had been not affected, i.e., μ = 12.44.
Alternate hypothesis (Ha): Then the mean had been affected, i.e., μ ≠ 12.44.
From the table described in the step to choose a statistical test, the sign of the hypotheses Ho and Ha point that a two-tailed test should be carried.
Since the sample is smaller than 30, then a student's T distribution could be employed.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value t α/2 for the degrees of freedom (sample size - 1).
The code to create the previous table is given as follows.
from scipy.stats import t
import pandas as pd
alpha = 0.05
alpha_list = [0.2, 0.1, 0.05, 0.02, 0.01, 0.002, 0.001]
df_dict = []
for df in range(1,26): # degrees of freedom
v_list = []
for alpha in alpha_list: # significance level alpha = 0.05 = 5%
v = round(t.ppf(1 - alpha/2, df),3)
v_list.append(v)
df_dict.append(v_list)
cols = [str(x) for x in alpha_list]
df = pd.DataFrame(df_dict, columns=cols)
df
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1P7DRHjbNfrVRrJqe2RL4w4uuvfgW5Osm?usp=sharing
Using α = 5% and n = 25 (df = 25 - 1 = 24) will lead to t α/2 = 2.060.
To compute a statistical test is necessary to convert the observed value in the mean of the sample (x̄) to the scale of a standard normal distribution (Tobs). These could be done using the following equation:
tobs = (x̄ - μ)/(s/(n^0.5))
This equation will result in the following numbers:
tobs = (13.71 - 12.44)/(2.65/(25^0.5)) = (13.71-12.44)/0.53 = 2.40
Since tobs = 2.40 is higher than upper critical value t α/2 = 2.06, then we can reject the Null hypothesis.
The previous solution could be summarized by following Python code. First, let's compute the critical value Tcrit for a specified α.
from scipy.stats import t
n = 25
gl = n - 1
alfa = 0.05
p = 1 - alfa/2
ts = t.ppf(p,gl)
print("Given alpha = ",str(round(alfa,2)),", Tcrit = ",round(ts,2))
Given alpha = 0.05 , Tcrit = 2.06
The next step is to compute the Tobs, and comparing it with Tcrit, and make a decision
mi = 12.44
H0 = "The value is equal to " + str(mi)
n = 25
xb = 13.71
s = 2.65
sx = s/(n**(0.5))
tobs = (xb - mi)/sx
print("Tobs = ",tobs,"and Tcrit = ",ts)
if (tobs > ts)|(tobs < ts): # Zobs belongs to the critical region
print("Reject H0: ",H0)
else:
print("Do not reject H0: ",H0)
Tobs = 2.396226415094342 and Tcrit = 2.0638985616280205
Reject H0: The value is equal to 12.44
Finally, the next code help to visualize the critical regions of the test hypothesis with Tcrit, and the Tobs, help to compare their values to make a decision.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t
x1 = np.arange(-ts, ts, 0.001) # range of x in spec
x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec
# mean = 0, stddev = 1, since Z-transform was calculated
y1 = t.pdf(x1,gl)
y_all = t.pdf(x_all,gl)
# build the plot
fig, ax = plt.subplots(figsize=(9,6))
plt.style.use('fivethirtyeight')
ax.plot(x_all,y_all)
mut = 0
ax.text(mut, 0.4, 'Ho', fontsize=14)
ax.text(mut-1.2*ts, 0.1, 'Ha', fontsize=14)
ax.text(mut+ts, 0.1, 'Ha', fontsize=14)
ax.fill_between(x1,y1,0, alpha=0.7, color='g')
ax.fill_between(x_all,y_all,0, alpha=0.1, color='r')
ax.set_xlim([-6,6])
ax.set_xlabel('# of Standard Deviations Outside the Mean')
ax.set_yticklabels([])
ax.set_title('Student T Curve CL = '+str(round((1-alfa)*100,0))+' %')
# drawing Tobs
ax.axvline(x = tobs, color = 'r', label = 'Zobs')
ax.text(1.1*tobs, 0.1, 'Tobs', fontsize=14)
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1Q3V3WvMsjwUngr19SxhAmbalCzay7qMc?usp=sharing