1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
Suppose the sample size n is smaller than 30. Then the confidence interval equation to estimate population mean is given by the following Equation.
where: μ is the mean of the population, x̄ just stands for the “sample mean”, t critical value providing the area of α/2 of the upper tail of the Student T distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.
Remember the relation between the interval range and the margin of error could be summarized in the following figure.
This will lead to the following relation equation relating confidence interval range W with the margin of error.
where: t critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.
Solving the equation to find the sample size n will lead to the following new equation.
Although, to obtain even the sample standard deviation s is necessary to extract an initial sample.
The next Python code helps to illustrate how to find the sample size using an initial sample of n = 21, t critical value with α = 10%, and confidence interval range W = 180.
from scipy.stats import t
import random
import numpy as np
random.seed(12) # Setting the initial value for the random generator
pop_data=range(1000) # assume a population with 1000 values
samples_means=[] # store the mean of each samples
n = 21
df = n - 1
sample_data=random.sample(pop_data, k=n) # Initial sample extract 21 values
s = np.std(sample_data) # compute standard deviation of the sample
p = 0.90 # proportion of the population covered by the interval
alfa = 1-p # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level z
ts = t.ppf(pr, df) # computing the critical z alpha/2
# Range of the confidence interval
W = 180
# Finally the final sample size n.
n = round((2*ts*s/W)**2,0)
n = int(n)
print('Sample size = ',n)
Sample size = 21
With the obtained sample size as n = 21, now let's check what will be the corresponding confidence interval range, employing the following Python code. Since the initial sample is already appropriate to the desired confidence interval range, then it will be employed in this second part of the code.
# It is not necessary another sample since the initial is already apropriate.
#sample_data=random.sample(pop_data, k=n) # Final sample extract n values
x = np.mean(sample_data) # mean of the sample
sigma = np.std(sample_data) # standard deviation of the sample
mux = x # using sample as a estimator to population mean
sigmax = sigma/(n**0.5) # normalized standard deviation
marg = ts*sigmax # margin of error
mux1 = x - marg # lower critical value
mux2 = x + marg # upper critical value
print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']') # Confidence interval
print('CI Range = ',round(mux2,2)-round(mux1,2))
CI with 90.0 % = [ 371.11 , 551.55 ]
CI Range = 180.43999999999994
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1r0etcYpjJWbrYo9TV4oqHDaV91Ju9rHV?usp=sharing