Normal distribution and Central Limit Theorem

Learning objectives (and summaries)

Understand how the Central Limit Theorem predicts how sampling distributions become normally distributed and use this to calculate probabilities of ranges of values.

- Calculate probability to the left of a z-score, between two z-scores, and to the right of a z-score in normally distributed data using normalcdf(left, right) on a TI-83.
- Convert a left-sided probability (percentile rank) into an x value.
- Recognize and read a cumulative probability plot.
- Understand that the Central Limit Theorem uses sample averages to make many types of distributions roughly normal.
- Use the Central Limit Theorem to find the standard deviation of a sample mean distribution: σ_x-bar =σ_x/ √n.
- Understand that a sampling distribution is the collection of all possible values of a sample. Be able to simulate and graph a sampling distribution.

Assessment (prerequisite: binomial distribution module)

- Test (14 elective pts): 8 questions (2 MC, 4 numeric, 2 short answer); 1 of the first 3 free response questions, 1 of the second 3 free response questions (3pts each):
- See AP Formula Sheet Here: tinyurl.com/apstatssheet
  - AP FRQ 2003 #3 (solution)
  - AP FRQ 2007B #2 (solution)
  - AP FRQ 2009 #2 (solution)
  - AP FRQ 2005B #5 (solution)
  - AP FRQ 2006B #3 (solution)
  - AP FRQ 2007 #3 (solution)
- Diffusion of innovation project (elective pts, based on quality/depth):
  - Watch Simon Sinek's explanation of the diffusion of innovation: http://www.youtube.com/watch?v=zU3fIEPfctQ
  - Skim the Wikipedia article: http://en.wikipedia.org/wiki/Diffusion_of_innovations
  - Choose an industry to research product adoption over time and collect data over a number of years.
  - Create a video that shows the normal curve and the cumulative "S" curve with the data you collected and give a timeline for when the diffusion of innovation occurred in this field, explaining the process and some background on the industry.
- Programming simulation of Central Limit Theorem (10 elective pts):
  - Use Python or some other programming language to generate a population of values. Then have it randomly takes samples of whatever size you tell it to (for example, it may take many samples each of 9 values) and calculate the average for each of the small samples. Have it display the original population and the distribution of sample means in the same histogram (different colors) or two separate histograms with the same axes so they are easy to compare.

Instruction

Practice

Sketch each situation. Then find the area under the normal curve.

- 1) z > 1.5
- 2) z < 0.4
- 3) -1.7 < z < -0.3
- 4) z > -2

A bottling factory fills bottles to a mean weight of 12.1oz. The standard deviation is 0.1oz.

- 5) The company tries to fill bottles between 12.0oz. and 12.2oz. What proportion of bottles fall in this range?
- 6) In what percentile is a bottle that is exactly the advertised weight of 12oz.?
- 7) A random bottle is pulled from the factory floor and weighed. It weighed 11.8oz. What proportion of the time will a randomly selected bottle be this low or lower?
- 8) What does the area under a normal curve (in a given range) represent?

9) What are the two magic powers of the Central Limit Theorem?

A SRS of 30 families of Byron was obtained by visiting 37 homes in town. The following results were obtained: Number of kids per family: x-bar = 2.4, s_x = 0.73

- 10) Does the sampling method used above sound like it would produce reasonable results? Include the method of selection, possible undercoverage, and the non-response rate when defending your answer. This takes some time to do all of these steps, but it is good practice for your projects, quizzes, and tests.
- 11) Is the sample large enough to get away with not knowing the actual distribution? Do you think that this distribution is single-peaked? Roughly symmetrical? How do these characteristics play into the importance of the sample size?
- 12) Based on the sample results above, what is the expected mean of the sampling distribution of x-bar?
- 13) Based on the sample results above, what is the expected standard deviation of the sampling distribution of x-bar?

A study of dreams found that mythical unicorn rides are normally distributed with a mean length of 23 minutes and a standard deviation of 4 minutes.

- 14) In what percentile is a random person if their ride lasts 25 minutes?
- 15) In what percentile is the average of a group of 8 random person if their average ride lasts 25 minutes?
- 16) How likely is a single ride to last between 20 and 25 minutes?
- 17) What is the Z-score of a 17 minute mythical unicorn ride?

The incomes of a set of factory workers happen to be normally distributed. The average income is $53,000 and the standard deviation is $9,000.

- 18) What is the probability that a randomly selected employee makes more than $65,000?
- 19) What is the probability that the average of 4 randomly selected employees makes more than $65,000?
- 20) What is the probability that the average of 12 randomly selected employees makes more than $65,000?

A crater imager is looking at a hole that is actually 2 feet deep. The readings the machine gives are normally distributed around the actual depth with a standard deviation of 1.5 feet.

- 21) What is the probability that a single reading comes back as a negative number?
- 22) What is the probability that the average of 3 readings comes back as a negative number?

A bunch of folks went on a huge fishing trip. At the end, everyone reported their heaviest fish. The group average was 22 lbs. with a standard deviation of 3.6 lbs.

- 23) Since the distribution might be skewed, can we figure out the probability that one person has a maximum fish that weighs more than 25 lbs.?
- 24) If we sampled 10 random people and assumed no major outliers, could we find the probability that their average was less than 20 lbs.? If so, what is it?

Practice solutions

- 1) (You should do the sketch always. It just takes too much work for me to make digital versions for all of them :/).
- normalCdf(1.5, 99) = 0.067
- 2) normalCdf(-99, 0.4) = 0.655
- 3) normalCdf(-1.7, -0.3) = 0.338
- 4) normalCdf(-2, 99) = 0.977
- 5) z-score for 12.0 is -1, z for 12.2 is 1

normalCdf(-1, 1) = 0.683
- 6) Percentile is the percent of values below the value you're interested in, so:
- z-score for 12.0 is -1
- normalCdf(-99, -1) = 0.159
- ~16th percentile
- 7) z-score for 11.8 is -3
- normalCdf(-99, -3) = 0.001
- 8) The proportion of individuals in that range.
- 9) Make all kinds of distributions look like normal curves
- Shrink the standard deviation of sampling distributions as the sample size goes up
- 10) Yes -- SRS is a good sampling method, low non-response (only 7/37 = 19%), possible undercoverage issues since there is a decent fraction of families that live in apartments.
- 11) Sample size of 30 can make most distributions normal. Since the number of kids in a family is probably somewhat normally distributed anyways, 30 is plenty.
- 12) 2.4 kids per family. The best estimate of the sampling distribution mean (which is the same as you population mean) is your sample mean. The mean for any size sample should be the same.
- 13) s_xbar = 0.73/√(30) = 0.133 kids per family. As the sample size goes up, the standard deviation of the sampling distribution goes down.
- 14) Percentile is the percent of values below the value you're interested in, so:
- z = (25 - 23) / 4 = 0.5
- normalCdf(-99, 0.5) = 0.691
- ~69th percentile
- 15) s_xbar = 4/√(8) = 1.41
- z = (25 - 23) / 1.41 = 1.41
- normalCdf(-99, 1.41) = 0.921
- ~92nd percentile
- It is less likely for the average of many people to be this far from the mean than for a single person to be this far from the mean -- that's why the percentile is so much higher.
- 16) z = (20 - 23) / 4 = -0.75
- z = (25 - 23) / 4 = 0.5
- normalCdf(-0.75, 0.5) = 0.465
- 17) z = (17 - 23) / 4 = -1.5
- 18) Make 65000 a z-score: z = (65000 - 53000) / 9000 = 1.333
- normalcdf(1.333, 1000) = .091 (almost a 10% chance)
- 19) When you have 4 samples averaged together, you need to update your standard deviation: s_xbar = 9000 / √(4) = 4500
- Now make 65000 a z-score: z = (65000 - 53000) / 4500 = 2.667
- normalcdf(2.667, 1000) = .0038 (a 0.3% chance)
- 20) Update the standard deviation again for a sample of 12 averaged together: s_xbar = 9000 / √(12) = 2598
- And make 65000 a z-score now: z = (65000 - 53000) / 2598 = 4.619
- normalcdf(4.619, 1000) = .0000019 (literally 1 in a million chance of happening)
- 21) Hint for interpreting this problem: the readings are centered around 2ft., which means that the mean is 2. A negative number is anything less than 0, so 2ft below the mean.
- Turn the 2ft difference into a z-score: (0 - 2) / 1.5 = -1.333
- To find the probability of getting less than 0, do normalcdf(-1000, -1.333 = .091
- 22) Update your standard deviation: s_xbar= 1.5 / √(3) = .866
- Find the z-score: (0 - 2) / .866 = -2.309
- Now find the probability of less than this value: normalcdf(-1000, -2.309) = .010
- 23) If it is not a nearly-perfect normal curve, we can't do much with it. Lucky you get to be lazy on this problem!
- 24) Sure could! Size of largest fish is probably fairly unimodal and a little skewed for the really big fish on the high end, but 10 people is often large enough to turn a distribution like that into a nice happy normal curve. So let's try it:
- Update the standard deviation: sxbar = 3.6 / √(10) = 1.138
- Find the z-score: (20 - 22) / 1.138 = -1.757
- normalcdf(-1000, -1.757) = .039

Notes

http://www.intuitor.com/statistics/CentralLim.html