Central Limit Theorem

The total length of the videos in this section is approximately 17 minutes, but you will also spend time answering short questions while completing this section.

You can also view all the videos in this section at the YouTube playlist linked here.

Distributions of Sample Means

A definition to know: a "standard normal distribution" is a normal distribution with mean 0 and variance 1.

CLT.1.Distributions of Sample Means.mp4

Question 1: Which is more surprising?

The average of 3 randomly drawn students' GPAs is 4.0
The average of 30 randomly drawn students' GPAs is 4.0

Show answer

30. The bigger the sample, the closer we expect the sample mean to be to the population mean, and therefore the smaller the variance of the sample mean.

Question 2: Suppose that we draw 100 women randomly from a population and record their heights, generating a representative sample. Would you expect the range of the sample's distribution to be:

roughly the same
wider
narrower

Show answer

Roughly the same. The histograms of 1 million women's heights (in the population) and the heights of 100 women drawn at random from the population should look roughly the same. The distribution doesn't depend on how many women are included in the histogram. However, the sample mean will have a narrower distribution, if we were to sample repeatedly and record the sample mean.

Question 3: If Y follows a standard normal distribution, the mean of a sample of size 16 follows a normal distribution where 95% of values lie between two numbers. Approximately what is the lower number?

Show answer

-0.5. Remember that a standard normal has mean zero and variance one. The expected value of the sample mean will be zero, and the variance of the sample mean will be 1/16, which is the variance of the original variable, divided by the sample size. So, the standard deviation of the sample mean is 1/4 = .25. The lower bound is 0-2*.25 = -0.5.

Several people have expressed confusion about this problem, so I am adding more discussion here. When we say that "Y follows a standard normal distribution," what we mean is that the variable Y has a normal distribution, with mean 0 and variance 1.

If we repeatedly sample 16 people from the population and record the sample mean, we will create a histogram of sample means. This histogram is expected to be normal, with the same mean as the population, which is zero. And, this histogram of sample means will have variance equal to the population variance (1) divided by the sample size (16), so 1/16. If the variance is 1/16, then the standard deviation is equal to the square root of the variance, which is 1/4, or 0.25.

We know that roughly 95% of the values in a normal distribution fall within two standard deviations of the mean. So, to establish a range such that 95% of the sample means in the histogram fall between the lower bound and upper bound, we take the mean and add or subtract 2 times the standard deviation. The lower bound is 0 - 2 *(0.25) = -0.5.

Central Limit Theorem

CLT.2.Central Limit Theorem.mp4

Question 4: Does the Central Limit Theorem require a large sample size if the population distribution is actually normal?

Show answer

No. When the population distribution is normal, the sample mean will follow a normal distribution regardless of sample size.

Check out my favorite Central Limit Theorem applet. Read the instructions and then click "begin" on the left side of the main page. Experiment a bit, and see if you can figure out what's going on.

Question 5: What is your main takeaway from playing with the applet? Did anything surprise you?

Show answer

I don't know what your main takeaway was or what surprised you. What always surprises me, every time, is how strong the Central Limit Theorem is. You can make some crazy population distributions by dragging your mouse around the histogram at the top of the page, and the blue distribution of sample means still looks normal, even for a sample size of 5.

The two videos that follow use notation that is familiar after a probability course. I am including them here because these derivations of the expectation and variance of the sample mean underly the intuition already discussed. Though I won't assume that you can complete these sorts of derivations on your own at this point, I hope that the general approach increasingly makes sense as you see similar derivations in this course.

Expected Value

CLT.3.Mu, Expected Value of Sample Means.mp4

Question 6: The "expectation" or "expected value" of a random variable is the

variance
median
mean

Show answer

Mean

Variance of sample means

CLT.4.Variance of Sample Means.mp4

Question is 7: In the expression (sigma^2)/n, is n the sample size or the population size?

Show answer

Sample size. This expression represents the variability of the sample mean, which is our estimate of the population mean. The precision of our estimate of the population mean depends on the sample size but not the population size.

And that's it!

During this tutorial you learned:

The relationship between the distribution of a population, distribution of a sample, and distribution of a sample mean over many possible samples
The relationship between the variance of a population (σ²) and the variance of sample means over many possible samples (σ²/n)
Expected value and relationship to population parameter, μ or mu

Terms and concepts:

distribution of a sample, distribution of sample means, expected value