1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
The next code helps to compute the confidence interval to estimate the mean weight of some products employing the data set of HS6 code.
The first part of the code employs the code developed in Track 06 - section 2.1 which resulted in the following code:
https://colab.research.google.com/drive/1Xo-2dWDgL-gmDJH3QmB6b4YMlntgQqtu?usp=sharing
First, you can load the previous code and save it with another name. Now, let's insert some new code, starting with recovering the filtered data about product weight from the database with the HS6 code.
df1.loc[filter]['weight_kg']
0 22000.0000
1 3946.2504
2 23650.0000
3 25581.0000
4 28601.0000
...
995 8212.0000
996 8224.0000
997 9543.0000
998 11564.0000
999 8152.0000
Name: weight_kg, Length: 974, dtype: float64
Now, let's compute the mean of the filtered weight.
filtered_weight = df1.loc[filter]['weight_kg']
mean_pop_weight = filtered_weight.mean()
mean_pop_weight
17066.114821047227
Finally, it is possible to extract a sample with size n from the population, stored in variable filtered_weight, with the command random.sample and compute its corresponding mean.
import random
n = 100 # sample size
sample_weight = random.sample(list(filtered_weight), n)
mean_sample_weight = sum(sample_weight)/len(sample_weight)
mean_sample_weight
18369.3315
It is also possible to compute the sample standard deviation,
import numpy as np
#calculate sample standard deviation of list
#variance_sample = sum([((x - mean_sample_weight) ** 2) for x in sample_weight]) / (len(sample_weight)-1)
#std_sample = variance_sample ** 0.5
std_sample = np.std(sample_weight, ddof=1)
std_sample
11970.19157009859
Using the developments and code explained in Track 07, section 1.5, and Track 07, section 1.6 will lead to employing the following code that computes a confidence interval to estimate the mean weight of the filtered data using a sample.
from scipy.stats import norm
p = 0.90 # Confidence level: population percentage covered by the interval
muz = 0 # mean of the standard normal
sigmaz = 1 # standard deviation of standard normal
alfa = 1-p # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level z
z = norm.ppf(pr, muz, sigmaz) # computing the critical z alpha/2
mux = mean_sample_weight # using a sample as an estimator to the population mean
sigmax = std_sample/(n**0.5) # normalized standard deviation
marg = z*sigmax # margin of error
x = mean_sample_weight
mux1 = x - marg # lower critical value
mux2 = x + marg # upper critical value
# Confidence interval
print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']')
CI with 90.0 % = [ 16400.41 , 20338.25 ]
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1oyXz5fTuaE3akY-l6IQv0gqPHvrdZCVX?usp=sharing