1.3. Building a CLT Simulator

The next code has the objective to illustrate the application of the Central Limit Theorem. The first step is to create a function that produces 1000 random values with discrete uniform distribution and returns its corresponding mean.

In particular, the next code helps to generate a continuous uniform distribution within the range [-40. 40].

import numpy as np

def getRandomInt(a, b, sample_size):

random_values = np.random.randint(a, b, sample_size)

return random_values

getRandomInt(-40, 40, 1)

array([22])

The next code creates a new function getRandom() that:

Calls 1000 times the function getRandomInt() with a specified sample size (given in sample_size variable as 1) of random integer values which follow a continuous random distribution.
For each call of getRandomInt(), compute the mean of each using the command np.mean.
Store the mean in a list.
Return the list with 1000 mean values.

import numpy as np

def gerRandom(a, b, sample_size):

# Generating seed so that we can get same result

# every time the loop is run...

np.random.seed(1)

means_samples = []

# Create random values with uniform distribution in interval [a, b].

# A total of a 'sample_size' values will be created.

# For these values the mean will be computed.

# This process will be repeated 1000 times to produce 1000 means of samples.

for i in range(1000):

random_values = getRandomInt(a, b, sample_size)

means_samples.append(np.mean(random_values))

return means_samples

sample_size = 1

x = gerRandom(-40, 40, sample_size)

print(x)

[-3.0, -28.0, 32.0, -31.0, 35.0, ...,, 28.0, 29.0, 21.0]

The next code creates employs the function test_gerRandom() to test the change in sample mean distribution by using four different sample size values, instead of only a fixed sample size equal to 1 (sample_size = 1) as done in the previous code. The variable list_sample_size gives four values of sample size: 1, 10, 30, and 100. After call the function test_gerRandom, the result is stored in the variable list_means_samples which is a list with four lists of sample means values.

def test_gerRandom(a, b, list_sample_size):

list_means_samples = []

for sample_size in list_sample_size:

means_samples = gerRandom(a, b, sample_size)

list_means_samples.append(means_samples)

return list_means_samples

a = -40

b = 40

list_sample_size = [1, 10, 30, 100]

list_means_samples = test_gerRandom(a, b, list_sample_size)

print(list_means_samples)

print(len(list_means_samples))

print(list_means_samples[0])

[[-3.0, -28.0, ..., 29.0, 21.0], [-3.0, -6.6,..., 6.0, -6.2], ...,]

[-3.0, -28.0, ..., 29.0, 21.0]

The distributions related to the data inside the variable list_means_samples could be plotted as a bar chart employing the next code.

import matplotlib.pyplot as plt

# plotting all the means in one figure

k=0

fig, ax = plt.subplots(2, 2, figsize =(8, 8))

for i in range(0, 2):

for j in range(0, 2):

# Histogram for each x stored in means

ax[i, j].hist(list_means_samples[k], 20, density = True)

ax[i, j].set_title(label = 'Sample size = '+str(list_sample_size[k]))

k = k + 1

plt.show()

The previous complete code is available in the following link:

https://colab.research.google.com/drive/1xlPjna5F0L2hBNJjyapA-zF3Pu5vF6N7?usp=sharing

Page updated

Google Sites

Report abuse