1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
Now that we’ve seen both the standard normal distribution and a t-distribution with a single degree of freedom, let’s plot them together to see how they compare.
# Library imports
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
# Normal distribution
x = np.linspace(-4, 4, 500)
y = stats.norm.pdf(x)
# T distribution
df = 1
y_t = stats.t.pdf(x, df)
# Plotting
plt.ylabel('Probability Density')
plt.xlabel('Standard Deviations')
plt.plot(x, y, color='blue', label='Normal Dist.')
plt.plot(x, y_t, color='green', label=f'T-Dist., df={df}')
plt.legend()
# Styling - optional
sns.set_context('notebook')
sns.despine()
With only a single degree of freedom, the t-distribution is much flatter and has fatter tails than the standard normal distribution. The power of the t-distribution comes from its ability to adjust for smaller sample sizes (and therefore less degrees of freedom) by effectively having a more conservative estimate of probability density. Put another way, the t-distribution adjusts for a natural decrease in confidence at lower sample sizes that the normal distribution does not account for.
With the change in the degree of freedom of the t-distribution with fixed location parameter number of points located at mean changes (height of t-distribution changes). The next code helps to illustrate this point.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import t
x = np.linspace(-5, 5, 100)
degrees_of_freedom = [1, 2, 5, 10] # Varying degrees of freedom
# Plotting T-distribution curves for different degrees of freedom
for df in reversed(degrees_of_freedom):
y = t.pdf(x, df) # Using default location and scale parameters (0 and 1)
plt.plot(x, y, label=f"Degrees of Freedom = {df}")
plt.xlabel('x')
plt.ylabel('PDF')
plt.title('T-Distribution with Varying Degrees of Freedom')
plt.legend()
plt.show()
At higher degrees of freedom, the t-distribution approximates the normal distribution, making it useful at both small and large sample sizes. The next code helps to illustrate this point.
# Library imports
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline
# Normal distribution
x = np.linspace(-5, 5, 500)
y = stats.norm.pdf(x)
plt.plot(x, y, color='blue', label='Normal Dist.')
# T distribution
# Plotting T-distribution curves for different degrees of freedom
for df in reversed(degrees_of_freedom):
y = t.pdf(x, df) # Using default location and scale parameters (0 and 1)
plt.plot(x, y, label=f"Degrees of Freedom = {df}")
# Plotting
plt.ylabel('Probability Density')
plt.xlabel('Standard Deviations')
#plt.plot(x, y_t, color='green', label=f'T-Dist., df={df}')
plt.legend()
# Styling - optional
sns.set_context('notebook')
sns.despine()
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1oaJLYH-3HOWi5kRCqF4wszNVOQiaAAUg?usp=sharing
References:
[1] Comparing normal and Student's T distribution:
https://tjkyner.medium.com/the-normal-distribution-vs-students-t-distribution-322aa12ffd15
[2] Student's T distribution and the impact of degrees of freedom
https://www.geeksforgeeks.org/python-students-t-distribution-in-statistics/