1.11. Estimating sample size using Student T distribution

where: μ is the mean of the population, x̄ just stands for the “sample mean”, t critical value providing the area of α/2 of the upper tail of the Student T distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.

Remember the relation between the interval range and the margin of error could be summarized in the following figure.

This will lead to the following relation equation relating confidence interval range W with the margin of error.

where: t critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.

Solving the equation to find the sample size n will lead to the following new equation.

Although, to obtain even the sample standard deviation s is necessary to extract an initial sample.

The next Python code helps to illustrate how to find the sample size using an initial sample of n = 21, t critical value with α = 10%, and confidence interval range W = 180.

from scipy.stats import t

import random

import numpy as np

random.seed(12) # Setting the initial value for the random generator

pop_data=range(1000) # assume a population with 1000 values

samples_means=[] # store the mean of each samples

n = 21

df = n - 1

sample_data=random.sample(pop_data, k=n) # Initial sample extract 21 values

s = np.std(sample_data) # compute standard deviation of the sample

p = 0.90 # proportion of the population covered by the interval

alfa = 1-p # obtaining the significance level: alpha

pr = 1-alfa/2 # percentage of the population higher than the critical level z

ts = t.ppf(pr, df) # computing the critical z alpha/2

# Range of the confidence interval

W = 180

# Finally the final sample size n.

n = round((2*ts*s/W)**2,0)

n = int(n)

print('Sample size = ',n)

Sample size = 21

With the obtained sample size as n = 21, now let's check what will be the corresponding confidence interval range, employing the following Python code. Since the initial sample is already appropriate to the desired confidence interval range, then it will be employed in this second part of the code.

# It is not necessary another sample since the initial is already apropriate.

#sample_data=random.sample(pop_data, k=n) # Final sample extract n values

x = np.mean(sample_data) # mean of the sample

sigma = np.std(sample_data) # standard deviation of the sample

mux = x # using sample as a estimator to population mean

sigmax = sigma/(n**0.5) # normalized standard deviation

marg = ts*sigmax # margin of error

mux1 = x - marg # lower critical value

mux2 = x + marg # upper critical value

print('CI with ',p*100,'% = [ ',round(mux1,2),',',round(mux2,2),']') # Confidence interval

print('CI Range = ',round(mux2,2)-round(mux1,2))

CI with 90.0 % = [ 371.11 , 551.55 ]

CI Range = 180.43999999999994

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1r0etcYpjJWbrYo9TV4oqHDaV91Ju9rHV?usp=sharing

Page updated

Google Sites

Report abuse