1.2. Central Limit Theorem

The central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

CLT is often used in conjunction with the law of large numbers, which states that the average of the sample means and standard deviations will come closer to equaling the population mean and standard deviation as the sample size grows, which is extremely useful in accurately predicting the characteristics of populations.

To illustrate the application of CLT, suppose that a sample is obtained containing many observations, each observation being randomly generated in a way that there is no matter whether the population has a normal, Poisson, binomial, or any other distribution, the arithmetic mean of the observed values is computed. If this procedure is performed many times, the central limit theorem says that the probability distribution of the average will closely approximate a normal distribution. The next figure helps to understand the procedure to employ CLT.

How to employ Central Limit Theorem?

The procedure to obtain the distribution of the sample means, i.e., a sampling distribution is given in the next figure.

The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the conditions:

1. The sample size is sufficiently large. This condition is usually met if the sample size is n ≥ 30.

2. The samples are independent and identically distributed (i.i.d.) random variables. This condition is usually met if the sampling is random.

3. The population’s distribution has a finite variance. The central limit theorem doesn’t apply to distributions with infinite variances, such as the Cauchy distribution. Most distributions have finite variance.

Is it valid to employ CLT in one single sample?

The central limit theorem mainly works for independent identical distributions. When the weight of each individual (or container) is a random variable, it is reasonable to assume that the distributions of the weights are identical. It's also safe to assume that they are independent since the importance of individual A has nothing to do with individual B's weight. It is possible to think of every individual as a Xi, a random variable with a distribution.

In general, the most ideal approach would be to measure the weight of all 1000 individuals and calculate the real mean, but in practice, it's not always possible to do that with the whole population, because it takes time and in some cases, it might be impossible. that's why to choose a random sample of size one hundred.

In conclusion, instead of thinking that CLT will be applied to a single sample with 100 individuals (Case A), it could be thought that it will be applied to 100 samples with n = 1 (Case B) as shown in the next Figures.

Therefore, theoretically taking 100 individuals and applying the CLT to the data could be a good approximation, but certainly, it's not exact since the CLT is only exact when n approaches infinity. Finally, your approximation gets better and better as n increases.

References:

[1] https://math.stackexchange.com/questions/4742029/does-the-central-limit-theorem-work-for-a-single-sample

[2] https://stats.stackexchange.com/questions/211499/why-does-the-central-limit-theorem-work-with-a-single-sample

Page updated

Google Sites

Report abuse