2.2. Shapiro Wilk for HS6 code weight samples

1. Concepts & Definitions

1.1. Defining statistical test of hypothesis

1.2. Numerical example of test of hypothesis for mean

1.3. Code for test of hypothesis for mean

1.4. Code for right tailed test of hypothesis for mean

1.5. Code for left tailed test of hypothesis for mean

1.6. Code for small sample hypothesis for mean

1.7. P-Value and test of hypothesis

1.8. Statistical power and power analysis

1.9. Shapiro Wilk for normality test

2. Problem & Solution

2.1. Shapiro Wilk to verify CLT Simulator

2.2. Shapiro Wilk for HS6 code weight samples

2.3. Test of hypothesis for weight of HS6 code

Fit a probability distribution from a database

Load the notebook with commands developed in Track 06 - step 2.1. (click on the link):

https://colab.research.google.com/drive/1Xo-2dWDgL-gmDJH3QmB6b4YMlntgQqtu?usp=sharing

2. Remember from previous section, the graph obtained:

Now, let's instead of using a filter for the values that are under 100000, consider a filter to consider values under 40000.

filter = df1['weight_kg'] < 40000

df1.loc[filter]

The following will appear:

The next code helps to draw the new distribution related to the filtered data frame:

weight = df1.loc[filter]['weight_kg']

weight.hist()

Mean and standard deviation from the population

The next code computes the mean and standard deviation for all filtered data and it will be referenced as a population since includes all available data.

import numpy as np

pop_mean_weight = sum(weight)/len(weight)

#calculate standard deviation of list

pop_std_weight = np.std(list(weight))

print(pop_mean_weight)

print(pop_std_weight)

16105.766274318656

9747.901629742

Extracting samples and computing the mean for 30 samples

The next code extract selects the filtered weight data and selects 100 samples each one with 30 randomly choosen values.

sample_size = 100

number_samples = 30

list_means_samples = []

for s in range(0, number_samples):

# Random_state parameter works like an initial seed for a random sampling.

sample = list(weight.sample(n=sample_size, random_state=s+3))

mean_sample = sum(sample)/len(sample)

list_means_samples.append(mean_sample)

list_means_samples

[15947.769067000001, 15870.988500000001, 17208.357780000002, 16519.316757, 15275.26839, 16086.101976000004, 16479.651276999997, 15510.297471000003, 14861.697939999998, 17155.882599999997, 16775.280599999995, 17941.656986, 16126.398980000004, 16058.165267, 15960.026557, 16497.744357, 16601.3141, 14700.071201000002, 15717.66677, 15991.896503999998, 16637.84862, 15807.00809, 15787.055264, 16646.368070000004, 15422.104810000004, 16776.087270000004, 16518.539460000004, 16536.24759, 17053.044540000003, 16993.828380000003]

The next code draws the corresponding distribution for the sampling mean.

from matplotlib import pyplot as plt

plt.hist(list_means_samples, 8)

plt.show()

Applying the Shapiro-Wilk test for sampling distribution

The next code employs the Shapiro-Wilk test to verify if the sampling distribution follows a normal distribution.

import matplotlib.pyplot as plt

from scipy.stats import shapiro

stat,p = shapiro(list_means_samples)

print("The Test-Statistic and p-value are as follows:\nTest-Statistic = %.3f , p-value = %.3f"%(stat,p))

The Test-Statistic and p-value are as follows:

Test-Statistic = 0.983 , p-value = 0.902

Remembering the rule to decide about P-value is:

High P-values: Your sample results are consistent with a true null hypothesis.
Low P-values: Your sample results are not consistent with a null hypothesis.

Also, remember that the Shapiro-Wilk test is used to calculate whether a random sample of data comes from a normal distribution which is a common assumption used in many statistical tests [1]. This means the following hypotheses will formulated [2]:

Ho = The sample comes from a normal distribution.
Ha = The sample is not coming from a normal distribution.

So, it can concluded that sampling distribution follows a normal distribution.

The complete code is available in the following link:

https://colab.research.google.com/drive/1x59Luhf0j8H1l4BjcqzdmiclSsz0nZrz?usp=sharing

Page updated

Google Sites

Report abuse