1.9. Shapiro Wilk for normality test

1. Concepts & Definitions

1.1. Defining statistical test of hypothesis

1.2. Numerical example of test of hypothesis for mean

1.3. Code for test of hypothesis for mean

1.4. Code for right tailed test of hypothesis for mean

1.5. Code for left tailed test of hypothesis for mean

1.6. Code for small sample hypothesis for mean

1.7. P-Value and test of hypothesis

1.8. Statistical power and power analysis

1.9. Shapiro Wilk for normality test

2. Problem & Solution

2.1. Shapiro Wilk to verify CLT Simulator

2.2. Shapiro Wilk for HS6 code weight samples

2.3. Test of hypothesis for weight of HS6 code

What is a Shapiro-Wilk test?

The Shapiro-Wilk test is used to calculate whether a random sample of data comes from a normal distribution which is a common assumption used in many statistical tests [1]. This means the following hypotheses will formulated [2]:

Ho = The sample comes from a normal distribution.

Ha = The sample is not coming from a normal distribution.

In this way, the test statistic and a corresponding p-value could be adopted to verify if the data follows or not a normal distribution using the following decision rule [1]:

If the p-value ≤ α, then we reject the null hypothesis i.e. we assume the distribution of our variable is not normal/gaussian.
If the p-value > α, then we fail to reject the null hypothesis i.e. we assume the distribution of our variable is normal/gaussian.

A Python code for Shapiro-Wilk test

The next code is useful to illustrate how to apply Shapiro-Wilk test in a selected data set [2, 3].

#import modules

import numpy as np

from scipy.stats import shapiro

# Using seed function to generate the same random number every time with the given seed value

np.random.seed(0)

#generate sample of 150 values that follow a normal distribution with mean =0 and standard deviation=1

mean1 = 0

sd1 = 1

data = np.random.normal(mean1,sd1,150)

#perform Shapiro-Wilk test

stat,p = shapiro(data)

print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

The Test-Statistic and p-value are as follows:

Test-Statistic = 0.990 , p-value = 0.345

Since p-value = 0.345 is greater than 0.05, then we fail to reject the null hypothesis i.e. we do not have sufficient evidence to say that the sample does not come from a normal distribution. This is already known to us as we generated the normally distributed sample using normal function from numpy library. Now, let’s take a look at a visual representation for the above dataset using the following code.