1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
Suppose, the sample size has less than 30 values, then the next figure help to understand what this means in terms of the statistical distribution of values, but instead of employing a normal distribution, the Student's T distribution should be employed.
A consequence of using a confidence interval of 95% is that there are 5% of values are outside the interval which means:
2.5% of the values are lower than the lower bound of the confidence interval.
2.5% of the values are greater than the upper bound of the confidence interval.
The sum of both percentages is called the significance level (α = alpha). Common significance levels include α =0.05 and α = 0.01. The relationship between the confidence level and the significance level is expressed as:
Confidence level = 1 - Significance level (α).
In other words, the confidence level equals one minus the significance level. For example, if our significance level is 0.05, this means that there is a 5% probability of rejecting the null hypothesis when it is true. The corresponding confidence level would be 1 - 0.05 = 0.95 or 95%.
This brings a new question on how to determine the critical value t α/2, n-1 that will be employed as an upper and lower bound to the confidence interval. For this purpose, the inverse of the value of the standard normal distribution is useful since it answers how should be the value of critical t α/2, n -1 covers a certain percentage of the population. The next figure illustrates this aspect.
The next table helps to understand the relation between confidence level, alpha (α), and the critical value t α/2, n-1 with n = 21 (degree of freedom is equal to n - 1 = 20)
The next Python code helps to obtain different critical values t α/2, n-1 of confidence level considering the two sides of the distribution: 90%, 95%, and 99%.
from scipy.stats import t
n = 21
df = n - 1
p1 = 0.90
alfa = 1-p1 # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level t
ts1 = t.ppf(pr,df) # computing the critical t alpha/2
p2 = 0.95
alfa = 1-p2 # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level t
ts2 = t.ppf(pr,df) # computing the critical t alpha/2
p3 = 0.99
alfa = 1-p3 # obtaining the significance level: alpha
pr = 1-alfa/2 # percentage of the population higher than the critical level t
ts3 = t.ppf(pr,df) # computing the critical t alpha/2
print('t critical for ',p1*100,'% = ',round(ts1,3))
print('t critical for ',p2*100,'% = ',round(ts2,3))
print('t critical for ',p3*100,'% = ',round(ts3,3))
t critical for 90.0 % = 1.725
t critical for 95.0 % = 2.086
t critical for 99.0 % = 2.845
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1rZ35bCgmLE8_uJejNLbdZjcBlI-LlNM8?usp=sharing
The range of the values from the point estimate on either side to the error magnitude is called the “ Margin of Error”. It gives information as to how far the error is located on either side of the point estimate.
where: μ is the mean of the population, x̄ just stands for the “sample mean”, t critical value providing the area of α/2 of the upper tail of the normal distribution and n-1 degrees of freedom, σ is the standard deviation of the population (use sample standard deviation s if population standard deviation is unknown), n is the sample size.