zJunk‎ > ‎

2. Normal distribution and Central Limit Theorem

Learning objectives (and summaries)

Understand how the Central Limit Theorem predicts how sampling distributions become normally distributed and use this to calculate probabilities of ranges of values.
    • Translate between values and z-scores.
      • z = (x - μ) / σ.  Z-score is the number of standard deviations a value is above (positive) or below (negative) the mean.
    • Calculate probability to the left of a z-score, between two z-scores, and to the right of a z-score in normally distributed data.
      • Use normalcdf(left, right) on a TI-83.
    • Convert a left-sided probability (percentile rank) into an x value.
      • Use invNorm(left_side_probability) on a TI-83.
    • Understand the difference between a distribution and a sampling distribution (for quantitative distributions).  Be able to explain how to simulate and be able to graph a sampling distribution.
      • A distribution is all of the values in the population for a given question/variable.
      • A sampling distribution is the collection of all possible averages of random samples taken from a population.  It is formed by randomly drawing a few of the values from the distribution at a time, taking their average, recording/plotting that number, and repeating infinitely for the given sample size.  
    • Understand how the Central Limit Theorem changes the shape of sampling distributions from the shape of the starting distribution.
      • By taking sample averages, the sampling distribution has values that are all pulled towards the middle, slowly eroding away separate peaks or gaps to make all distribution shapes into a normal-shaped sampling distribution.
      • Above a sample size of 30, all but the most extreme starting distributions result in a normal-shaped sampling distribution.
    • Use the Central Limit Theorem to find the standard deviation of a sample mean distribution.
      • σ =σ/ √n.  If you want to divide the standard deviation by 2, you need to multiply the sample size by 4.

    Assessment
      • Quiz (10 points): 8 questions; 1 of the 3 free response questions (2 pts):
        • Explain why product adoption stages likely follow the normal distribution.
        • Explain why adding more levels to a plinko board leads to a probability distribution that looks more and more like the normal curve.
        • Explain how the Central Limit Theorem turns all sampling distributions into normal curves when the sample size is large enough.
      Instruction

          Diffusion of innovation -- product adoption groups (see free response question):

          Practice
              Sketch each situation.  Then find the area under the normal curve.
              • 1) z > 1.5
              • 2) z < 0.4
              • 3) -1.7 < z < -0.3 
              • 4) z > -2
              A bottling factory fills bottles to a mean weight of 12.1oz.  The standard deviation is 0.1oz.
              • 5) The company tries to fill bottles between 12.0oz. and 12.2oz.  What proportion of bottles fall in this range?
              • 6) In what percentile is a bottle that is exactly the advertised weight of 12oz.?
              • 7) A random bottle is pulled from the factory floor and weighed.  It weighed 11.8oz.  What proportion of the time will a randomly selected bottle be this low or lower?
              • 8) What does the area under a normal curve (in a given range) represent?
              9a) What is the difference between a distribution (of x) and a sampling distribution (of x̄)?
              9b) What are the two magic powers of the Central Limit Theorem?

              A SRS of 30 families of Byron was obtained by visiting 37 homes in town.  The following results were obtained: Number of kids per family:  = 2.4, sx = 0.73
              • 10) Does the sampling method used above sound like it would produce reasonable results?  Include the method of selection, possible undercoverage, and the non-response rate when defending your answer.  This takes some time to do all of these steps, but it is good practice for your projects, quizzes, and tests.
              • 11) Is the sample large enough to get away with not knowing the actual distribution?  Do you think that this distribution is single-peaked?  Roughly symmetrical?  How do these characteristics play into the importance of the sample size?
              • 12) Based on the sample results above, what is the expected mean of the sampling distribution of x̄?
              • 13) Based on the sample results above, what is the expected standard deviation of the sampling distribution of x̄?
              A study of dreams found that mythical unicorn rides are normally distributed with a mean length of 23 minutes and a standard deviation of 4 minutes.
              • 14) In what percentile is a random person if their ride lasts 25 minutes?
              • 15) In what percentile is the average of a group of 8 random person if their average ride lasts 25 minutes?
              • 16) How likely is a single ride to last between 20 and 25 minutes?
              • 17) What is the Z-score of a 17 minute mythical unicorn ride?
              The incomes of a set of factory workers happen to be normally distributed.  The average income is $53,000 and the standard deviation is $9,000.
              • 18) What is the probability that a randomly selected employee makes more than $65,000?
              • 19) What is the probability that the average of 4 randomly selected employees makes more than $65,000?
              • 20) What is the probability that the average of 12 randomly selected employees makes more than $65,000?
              A crater imager is looking at a hole that is actually 2 feet deep.  The readings the machine gives are normally distributed around the actual depth with a standard deviation of 1.5 feet.
              • 21) What is the probability that a single reading comes back as a negative number?
              • 22) What is the probability that the average of 3 readings comes back as a negative number?
              A bunch of folks went on a huge fishing trip.  At the end, everyone reported their heaviest fish.  The group average was 22 lbs. with a standard deviation of 3.6 lbs.
              • 23) Since the distribution might be skewed, can we figure out the probability that one person has a maximum fish that weighs more than 25 lbs.?
              • 24) If we sampled 10 random people and assumed no major outliers, could we find the probability that their average was less than 20 lbs.?  If so, what is it?
              Practice solutions
                  • 1) (You should do the sketch always.  It just takes too much work for me to make digital versions for all of them :/).
                    normalCdf(1.5, 99) = 0.067
                  • 2) normalCdf(-99, 0.4) = 0.655
                  • 3) normalCdf(-1.7, -0.3) = 0.338
                  • 4) normalCdf(-2, 99) = 0.977
                  • 5) z-score for 12.0 is -1, z for 12.2 is 1
                  • normalCdf(-1, 1) = 0.683
                  • 6) Percentile is the percent of values below the value you're interested in, so:
                    z-score for 12.0 is -1
                    normalCdf(-99, -1) = 0.159
                    ~16th percentile
                  • 7) z-score for 11.8 is -3
                    normalCdf(-99, -3) = 0.001
                  • 8) The proportion of individuals in that range.
                  • 9a) A distribution is all of the values in the population for a given question/variable.  A sampling distribution is formed by randomly drawing a few of the values from the distribution at a time, taking their average, recording that number, and repeating infinitely for the given sample size.
                  • 9b) [1] It makes the sampling distributions of all kinds of distributions look like normal curves
                          [2] It shrinks the standard deviation of sampling distributions as the sample size goes up
                  • 10) Yes -- SRS is a good sampling method, low non-response (only 7/37 = 19%), possible undercoverage issues since there is a decent fraction of families that live in apartments.
                  • 11) Sample size of 30 can make most distributions normal.  Since the number of kids in a family is probably somewhat normally distributed anyways, 30 is plenty.
                  • 12) 2.4 kids per family.  The best estimate of the sampling distribution mean (which is the same as you population mean) is your sample mean.  The mean for any size sample should be the same.
                  • 13)  s = 0.73/√(30) = 0.133 kids per family.  As the sample size goes up, the standard deviation of the sampling distribution goes down.
                  • 14) Percentile is the percent of values below the value you're interested in, so:
                    z = (25 - 23) / 4 = 0.5
                    normalCdf(-99, 0.5) = 0.691
                    ~69th percentile
                  • 15) s = 4/√(8) = 1.41
                    z = (25 - 23) / 1.41 = 1.41
                    normalCdf(-99, 1.41) = 0.921
                    ~92nd percentile
                    It is less likely for the average of many people to be this far from the mean than for a single person to be this far from the mean -- that's why the percentile is so much higher.
                  • 16) z = (20 - 23) / 4 = -0.75
                    z = (25 - 23) / 4 = 0.5
                    normalCdf(-0.75, 0.5) = 0.465
                  • 17) z = (17 - 23) / 4 = -1.5
                  • 18) Make 65000 a z-score: z = (65000 - 53000) / 9000 = 1.333
                    normalcdf(1.333, 1000) = .091 (almost a 10% chance)
                  • 19) When you have 4 samples averaged together, you need to update your standard deviation: s = 9000 / √(4) = 4500
                    Now make 65000 a z-score: z = (65000 - 53000) / 4500 = 2.667
                    normalcdf(2.667, 1000) = .0038 (a 0.3% chance)
                  • 20) Update the standard deviation again for a sample of 12 averaged together: s = 9000 / √(12) = 2598
                    And make 65000 a z-score now: z = (65000 - 53000) / 2598 = 4.619
                    normalcdf(4.619, 1000) = .0000019 (literally 1 in a million chance of happening)
                  • 21) Hint for interpreting this problem: the readings are centered around 2ft., which means that the mean is 2.  A negative number is anything less than 0, so 2ft below the mean.
                    Turn the 2ft difference into a z-score: (0 - 2) / 1.5 = -1.333
                    To find the probability of getting less than 0, do normalcdf(-1000, -1.333 = .091
                  • 22) Update your standard deviation: sx̄ = 1.5 / √(3) = .866
                    Find the z-score: (0 - 2) / .866 = -2.309
                    Now find the probability of less than this value: normalcdf(-1000, -2.309) = .010
                  • 23) If it is not a nearly-perfect normal curve, we can't do much with it.  Lucky you get to be lazy on this problem!
                  • 24) Sure could!  Size of largest fish is probably fairly unimodal and a little skewed for the really big fish on the high end, but 10 people is often large enough to turn a distribution like that into a nice happy normal curve.  So let's try it:
                    Update the standard deviation: s = 3.6 / √(10) = 1.138
                    Find the z-score: (20 - 22) / 1.138 = -1.757
                    normalcdf(-1000, -1.757) = .039
                  Notes
                      http://www.intuitor.com/statistics/CentralLim.html
                      Ċ
                      2003 3.pdf
                      (152k)
                      Andy Pethan,
                      Oct 11, 2013, 11:19 AM
                      Ċ
                      Andy Pethan,
                      Oct 20, 2013, 1:21 PM
                      Ċ
                      Andy Pethan,
                      Oct 20, 2013, 1:18 PM
                      Ċ
                      Andy Pethan,
                      Oct 11, 2013, 11:19 AM
                      Ċ
                      2007 3.pdf
                      (145k)
                      Andy Pethan,
                      Oct 11, 2013, 11:19 AM
                      Ċ
                      Andy Pethan,
                      Oct 11, 2013, 11:19 AM
                      Ċ
                      Andy Pethan,
                      Oct 11, 2013, 11:19 AM
                      Ċ
                      2009 2.pdf
                      (141k)
                      Andy Pethan,
                      Oct 11, 2013, 11:20 AM