1.10. Estimating sample size using normal distribution

where: μ is the mean of the population, x̄ just stands for the “sample mean”, z critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.

Remember the relation between the interval range and the margin of error could be summarized in the following figure.

This will lead to the following relation equation relating confidence interval range W with the margin of error.

Solving the equation to find the sample size n will lead to the following new equation.

where: z critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.

Although, to obtain even the sample standard deviation s is necessary to extract an initial sample.

The next Python code helps to illustrate how to find the sample size using an initial sample of n = 30, z critical value with α = 10%, and confidence interval range W = 80.

from scipy.stats import norm

import random

import numpy as np

random.seed(12) # Setting the initial value for the random generator

pop_data=range(1000) # assume a population with 1000 values

samples_means=[] # store the mean of each samples

sample_data=random.sample(pop_data, k=30) # Initial sample extract 30 values

s = np.std(sample_data) # compute standard deviation of the sample

muz = 0 # mean of the standard normal

sigmaz = 1 # standard deviation of standard normal

p = 0.90 # proportion of the population covered by the interval

alfa = 1-p # obtaining the significance level: alpha

pr = 1-alfa/2 # percentage of the population higher than the critical level z

z = norm.ppf(pr, muz, sigmaz) # computing the critical z alpha/2

# Range of the confidence interval

W = 80

# Finally the final sample size n.

n = round((2*z*s/W)**2,0)

n = int(n)

print('Sample size = ',n)

Sample size = 100

With the obtained sample size as n = 100, now let's check what will be the corresponding confidence interval range, employing the following Python code.

sample_data=random.sample(pop_data, k=n) # Final sample extract n values

x = np.mean(sample_data) # mean of the sample

sigma = np.std(sample_data) # standard deviation of the sample

mux = x # using sample as a estimator to population mean

sigmax = sigma/(n**0.5) # normalized standard deviation

marg = z*sigmax # margin of error

mux1 = x - marg # lower critical value

mux2 = x + marg # upper critical value

print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']') # Confidence interval