2.1. Confidence interval for weight of HS6 code

First, you can load the previous code and save it with another name. Now, let's insert some new code, starting with recovering the filtered data about product weight from the database with the HS6 code.

df1.loc[filter]['weight_kg']

0 22000.0000

1 3946.2504

2 23650.0000

3 25581.0000

4 28601.0000

...

995 8212.0000

996 8224.0000

997 9543.0000

998 11564.0000

999 8152.0000

Name: weight_kg, Length: 974, dtype: float64

Now, let's compute the mean of the filtered weight.

filtered_weight = df1.loc[filter]['weight_kg']

mean_pop_weight = filtered_weight.mean()

mean_pop_weight

17066.114821047227

Computing the estimated mean weight of HS6 code using a sample

Finally, it is possible to extract a sample with size n from the population, stored in variable filtered_weight, with the command random.sample and compute its corresponding mean.

import random

n = 100 # sample size

sample_weight = random.sample(list(filtered_weight), n)

mean_sample_weight = sum(sample_weight)/len(sample_weight)

mean_sample_weight

18369.3315

It is also possible to compute the sample standard deviation,

import numpy as np

#calculate sample standard deviation of list

#variance_sample = sum([((x - mean_sample_weight) ** 2) for x in sample_weight]) / (len(sample_weight)-1)

#std_sample = variance_sample ** 0.5

std_sample = np.std(sample_weight, ddof=1)

std_sample

11970.19157009859

Using confidence interval to estimate mean weight of HS6 code

Using the developments and code explained in Track 07, section 1.5, and Track 07, section 1.6 will lead to employing the following code that computes a confidence interval to estimate the mean weight of the filtered data using a sample.

from scipy.stats import norm

p = 0.90 # Confidence level: population percentage covered by the interval

muz = 0 # mean of the standard normal

sigmaz = 1 # standard deviation of standard normal

alfa = 1-p # obtaining the significance level: alpha

pr = 1-alfa/2 # percentage of the population higher than the critical level z

z = norm.ppf(pr, muz, sigmaz) # computing the critical z alpha/2

mux = mean_sample_weight # using a sample as an estimator to the population mean

sigmax = std_sample/(n**0.5) # normalized standard deviation

marg = z*sigmax # margin of error

x = mean_sample_weight

mux1 = x - marg # lower critical value

mux2 = x + marg # upper critical value

# Confidence interval

print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']')

CI with 90.0 % = [ 16400.41 , 20338.25 ]

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1oyXz5fTuaE3akY-l6IQv0gqPHvrdZCVX?usp=sharing

Page updated

Google Sites

Report abuse