[TBD]Central limit theorem

Key points

- Sample size increases, then the curve becomes more straight. It means that spread across mean will be smaller

Pre-requisites

Normal distribution

Random variable -> Length of watermelon

Example: X axis -> Length of watermelon

Y axis -> Frequency of a given X (P(|X=x|))

- Normal distribution is symmetric across both side of mean Y
- Sum of area in normal distribution is 1
- Normal distribution is asymptotic in both sides
- Area with normal distribution
  - Normal distribution mean + 1 SD -> 68% area
  - Normal distribution mean + 2 SD(actually 1.96 SD) -> 95% area
  - Normal distribution mean + 3 SD -> 99.7% area

https://www.youtube.com/watch?v=gI5y3RZe9fk

Population proportions

Its % of population that has a characteristic.

For example, 47% of population in a city wants to buy bike

97% of population carries cell-phone

Learning video

Central limit theorem

Irrespective of the shape of original population, sampling distribution of means will approach normal distribution as the size of sample increases and becomes large (Ref here). For example 50 size sample is better than 5 size sample. Also as per law of large numbers, 30 size samples are good (Ref here). Once you decide on N, then all samples must be of same size of N.

- If population distribution is normal, then mean of samples of N size follows the normal distribution for any N
- Mean of sample means (mean of sample distribution) = mean of population
- If population distribution is not normal then mean of samples of N size follows the normal distribution for any N>=30
- Variance of sample distribution = (variance)^2/N, N is the sample size
- Sample must be independent. 2nd sample should be independent of first, 3rd sample should be independent of 2nd and so on. So, samples must be taken with replacement.
  - Number of trials should be sufficiently large. If someone takes only 5 trials, then it is not sufficient (Ref here)

https://www.youtube.com/watch?v=IiV6blF1crE

https://www.youtube.com/watch?v=_YOr_yYPytM

Inferences

- From the sampling distribution mean, we can get population mean (irrespective of Mu and Sigma)
  - Sampling normal distribution mean + 1 SD -> 68% confidence
  - Sampling normal distribution mean + 2 SD(actually 1.96 SD) -> 95% confidence
  - Sampling normal distribution mean + 3 SD -> 99.7% confidence
  - So, with 2 variances, we can say that population mean will be in range of min and max of sampling distribution with 95% confidence
  - For 99% confidence, take 3 variances
- From the sample variance, we can reverse compute population variance

Points

- Sample mean is considered as unbiased estimator of population. [Ref here]

Reference

https://sites.google.com/site/jbsakabffoi12449ujkn/home/look-with-the-eye-of-statistician/dissection-of-gaussian-bell-curve

Page updated

Google Sites

Report abuse