1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
Suppose the sample size n is bigger than 30. Then the confidence interval equation is given by the following Equation.
where: μ is the mean of the population, x̄ just stands for the “sample mean”, z critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.
Remember the relation between the interval range and the margin of error could be summarized in the following figure.
This will lead to the following relation equation relating confidence interval range W with the margin of error.
Solving the equation to find the sample size n will lead to the following new equation.
where: z critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.
Although, to obtain even the sample standard deviation s is necessary to extract an initial sample.
The next Python code helps to illustrate how to find the sample size using an initial sample of n = 30, z critical value with α = 10%, and confidence interval range W = 80.
from scipy.stats import norm
import random
import numpy as np
random.seed(12) # Setting the initial value for the random generator
pop_data=range(1000) # assume a population with 1000 values
samples_means=[] # store the mean of each samples
sample_data=random.sample(pop_data, k=30) # Initial sample extract 30 values
s = np.std(sample_data) # compute standard deviation of the sample
muz = 0 # mean of the standard normal
sigmaz = 1 # standard deviation of standard normal
p = 0.90 # proportion of the population covered by the interval
alfa = 1-p # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level z
z = norm.ppf(pr, muz, sigmaz) # computing the critical z alpha/2
# Range of the confidence interval
W = 80
# Finally the final sample size n.
n = round((2*z*s/W)**2,0)
n = int(n)
print('Sample size = ',n)
Sample size = 100
With the obtained sample size as n = 100, now let's check what will be the corresponding confidence interval range, employing the following Python code.
sample_data=random.sample(pop_data, k=n) # Final sample extract n values
x = np.mean(sample_data) # mean of the sample
sigma = np.std(sample_data) # standard deviation of the sample
mux = x # using sample as a estimator to population mean
sigmax = sigma/(n**0.5) # normalized standard deviation
marg = z*sigmax # margin of error
mux1 = x - marg # lower critical value
mux2 = x + marg # upper critical value
print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']') # Confidence interval
print('CI Range = ',round(mux2,2)-round(mux1,2))
CI with 90.0 % = [ 432.96 , 523.24 ]
CI Range = 90.28000000000003
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1_1eEnMgux69hfN8mubAkmifO9Xa8aez-?usp=sharing