1.12. Estimating proportion using samples

where: μ is the mean of the population, p̂ sample proportion and q is complementary proportion, z critical value providing the area of α/2 of the upper tail of the normal distribution, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.

A sample survey of 1502 importation products randomly selected in the US, indicates that 10% had some type of problem. Get the 90% confidence interval for the proportion p of the corresponding population.

For this example: n (sample size) = 1502, p̂ (sample proportion of issue) = 10%, and q proportion is 1 - p̂ = 0.9, and z α/2 (critical value) = 1.645.

The interval confidence estimation for the population mean using sample data is given by:

μ = p̂ +- z α/2 * (p̂q/n)^0.5.

μ = 0.1 +- 1.645 * (0.1*0.9/1502)^0.5

= 0.1 +- 1.645 * (0.09/1502)^0.5

= 0.1 +- 1.645 * 0.0077

= 0.1 +- 0.0127

= [8.73%, 11.27%]

The next code in Python helps to automate the computation of confidence interval using a normal distribution. This could be easily adapted to any other numerical example which fullfill the same conditions.

from scipy.stats import norm

n = 1502

pc = 0.1

qc = 1 - pc

muz = 0

sigmaz = 1

p = 0.90

alfa = 1-p

pr = 1 - alfa/2

z = norm.ppf(pr,muz,sigmaz)

sigmax = (pc*qc/n)**(0.5)

marg = z*sigmax

mux1 = pc - marg

mux2 = pc + marg

print("CI with ",p*100," % = [",round(mux1*100,2),"%, ",round(mux2*100,2),"%] ")

CI with 90.0 % = [ 8.73 %, 11.27 %]

Computing the sample size of interval to estimate proportion

Suppose the sample size n is bigger than 30. Then the confidence interval equation to estimate proportion is given by the following Equation.

Remember the relation between the interval range and the margin of error could be summarized in the following figure.

This will lead to the following relation equation relating confidence interval range W with the margin of error.

Solving the equation to find the sample size n will lead to the following new equation.

Although, to obtain even the sample proportion p̂ is necessary to extract an initial sample. An alternative to this approach is to suppose the worst-case scenario in terms of sample size which occurs when p̂ = 0.5.

The next Python code helps to illustrate how to find the sample size using the worst-case approach with z critical value with α = 10%, and confidence interval range W = 0.01.

from scipy.stats import norm

E = 0.01

pq = 0.5*0.5

muz = 0

sigmaz = 1

p = 0.90

alfa = 1-p

pr = 1 - alfa/2

z = norm.ppf(pr, muz, sigmaz)

n = (z**2)*(pq)/(E**2)

print("Sample size should be = ",round(n,0))

Sample size should be = 6764.0

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1nmwk--kl3bFIoIyKvth2bSpCTTsq8iWl?usp=sharing

Page updated

Google Sites

Report abuse