1. Concepts & Definitions
1.2. Central Limit Theorem (CLT)
1.5. Confidence interval and normal distribution
1.6. Applying normal confidence interval
1.7. Normal versus Student's T distributions
1.8. Confidence interval and Student T distribution
1.9. Applying Student T confidence interval
1.10. Estimating sample size using normal distribution
1.11. Estimating sample size using Student T distribution
1.12. Estimating proportion using samples
2. Problem & Solution
2.1. Confidence interval for weight of HS6 code
The next code has the objective to illustrate the application of the Central Limit Theorem. The first step is to create a function that produces 1000 random values with discrete uniform distribution and returns its corresponding mean.
In particular, the next code helps to generate a continuous uniform distribution within the range [-40. 40].
import numpy as np
def getRandomInt(a, b, sample_size):
random_values = np.random.randint(a, b, sample_size)
return random_values
getRandomInt(-40, 40, 1)
array([22])
The next code creates a new function getRandom() that:
Calls 1000 times the function getRandomInt() with a specified sample size (given in sample_size variable as 1) of random integer values which follow a continuous random distribution.
For each call of getRandomInt(), compute the mean of each using the command np.mean.
Store the mean in a list.
Return the list with 1000 mean values.
import numpy as np
def gerRandom(a, b, sample_size):
# Generating seed so that we can get same result
# every time the loop is run...
np.random.seed(1)
means_samples = []
# Create random values with uniform distribution in interval [a, b].
# A total of a 'sample_size' values will be created.
# For these values the mean will be computed.
# This process will be repeated 1000 times to produce 1000 means of samples.
for i in range(1000):
random_values = getRandomInt(a, b, sample_size)
means_samples.append(np.mean(random_values))
return means_samples
sample_size = 1
x = gerRandom(-40, 40, sample_size)
print(x)
[-3.0, -28.0, 32.0, -31.0, 35.0, ...,, 28.0, 29.0, 21.0]
The next code creates employs the function test_gerRandom() to test the change in sample mean distribution by using four different sample size values, instead of only a fixed sample size equal to 1 (sample_size = 1) as done in the previous code. The variable list_sample_size gives four values of sample size: 1, 10, 30, and 100. After call the function test_gerRandom, the result is stored in the variable list_means_samples which is a list with four lists of sample means values.
def test_gerRandom(a, b, list_sample_size):
list_means_samples = []
for sample_size in list_sample_size:
means_samples = gerRandom(a, b, sample_size)
list_means_samples.append(means_samples)
return list_means_samples
a = -40
b = 40
list_sample_size = [1, 10, 30, 100]
list_means_samples = test_gerRandom(a, b, list_sample_size)
print(list_means_samples)
print(len(list_means_samples))
print(list_means_samples[0])
[[-3.0, -28.0, ..., 29.0, 21.0], [-3.0, -6.6,..., 6.0, -6.2], ...,]
4
[-3.0, -28.0, ..., 29.0, 21.0]
The distributions related to the data inside the variable list_means_samples could be plotted as a bar chart employing the next code.
import matplotlib.pyplot as plt
# plotting all the means in one figure
k=0
fig, ax = plt.subplots(2, 2, figsize =(8, 8))
for i in range(0, 2):
for j in range(0, 2):
# Histogram for each x stored in means
ax[i, j].hist(list_means_samples[k], 20, density = True)
ax[i, j].set_title(label = 'Sample size = '+str(list_sample_size[k]))
k = k + 1
plt.show()
The previous complete code is available in the following link:
https://colab.research.google.com/drive/1xlPjna5F0L2hBNJjyapA-zF3Pu5vF6N7?usp=sharing