1. Concepts & Definitions
1.1. Defining statistical test of hypothesis
1.2. Numerical example of test of hypothesis for mean
1.3. Code for test of hypothesis for mean
1.4. Code for right tailed test of hypothesis for mean
1.5. Code for left tailed test of hypothesis for mean
1.6. Code for small sample hypothesis for mean
1.7. P-Value and test of hypothesis
1.8. Statistical power and power analysis
1.9. Shapiro Wilk for normality test
2. Problem & Solution
2.1. Shapiro Wilk to verify CLT Simulator
The Shapiro-Wilk test is used to calculate whether a random sample of data comes from a normal distribution which is a common assumption used in many statistical tests [1]. This means the following hypotheses will formulated [2]:
Ho = The sample comes from a normal distribution.
Ha = The sample is not coming from a normal distribution.
In this way, the test statistic and a corresponding p-value could be adopted to verify if the data follows or not a normal distribution using the following decision rule [1]:
If the p-value ≤ α, then we reject the null hypothesis i.e. we assume the distribution of our variable is not normal/gaussian.
If the p-value > α, then we fail to reject the null hypothesis i.e. we assume the distribution of our variable is normal/gaussian.
The next code is useful to illustrate how to apply Shapiro-Wilk test in a selected data set [2, 3].
#import modules
import numpy as np
from scipy.stats import shapiro
# Using seed function to generate the same random number every time with the given seed value
np.random.seed(0)
#generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1
mean1 = 0
sd1 = 1
data = np.random.normal(mean1,sd1,150)
#perform Shapiro-Wilk test
stat,p = shapiro(data)
print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))
The Test-Statistic and p-value are as follows:
Test-Statistic = 0.990 , p-value = 0.345
Since p-value = 0.345 is greater than 0.05, then we fail to reject the null hypothesis i.e. we do not have sufficient evidence to say that the sample does not come from a normal distribution. This is already known to us as we generated the normally distributed sample using normal function from numpy library. Now, let’s take a look at a visual representation for the above dataset using the following code.
#import modules
import numpy as np
from scipy.stats import shapiro
import matplotlib.pyplot as plt
# Using seed function to generate the same random number every time with the given seed value
np.random.seed(0)
#generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1
mean1 = 0
sd1 = 1
data = np.random.normal(mean1,sd1,150)
#plot the histogram
count, bins, ignored = plt.hist(data, 10)
plt.show()