In previous units we have seen how data can be represented by histograms. A density curve is a curve that gives an approximate description of a distribution. The curve is smooth, so any small irregularities in the data are ignored. A density curve for a particular histogram is shown below. Perhaps the most important thought to remember about a density curve is that it represents 100% of the data. In other words, the area under any density curve is equal to 1. This is important because it allows us to ask probability questions about a population. For example, we might ask how likely is it that a teenager has a shoe size of 8 or larger.
[Figure2]
In our unit, we will focus on a special density curve called the normal curve. Have you ever wondered if you are 'normal'? You probably are normal in most ways, but there may be some things about you that might not be considered normal by the mathematical definition. If you are on the high school baseball team, do you throw the baseball at a 'normal' speed? Is your hair a 'normal' length? Do you drive at a 'normal' speed on the freeway? Our goal this unit is to gain an understanding of what 'normal' really is and how to properly calculate within the Normal Distribution. We have seen skewed distributions before. The density curves in the following figure show one density curve that is skewed left and one that is skewed right.
[Figure3]
A normal curve is neither skewed left nor right and is often referred to as 'the bell curve' because of its shape. It is symmetrical. In addition, as you get closer and closer to the middle of the curve, there is a higher frequency of results. The mean (along with the median and mode) always lands at the center of a normal distribution. When dealing with the mean in previous units, we have used the symbol
because the data came from a sample. Normal distributions deal with an entire population instead of just a sample and we will use the symbol
(Greek letter mu) to mark the mean of a normal distribution for an entire population. The mean is one of two key values needed to make a proper sketch and analysis of a normal distribution. The curve shown below represents a normal distribution and is a good representation of what a normal curve looks like.
[Figure5]
Note that the amount of data to the right of the mean is the same as the amount of data to the left of the mean. Thinking about the definition of the median, this suggests that the mean and median are located at the same point. The other key component used to construct and analyze a normal distribution is the standard deviation. The standard deviation is a measure of spread and can be loosely thought of as a 'typical' distance from the mean. You may have calculated the standard deviation before for data sets either by hand or by using your calculator and looked for the Sx in the statistical calculations summary screen. The symbol Sx is used for the standard deviation whenever data is collected through the use of a sample from a population. When dealing with the normal distribution, we will use the symbol
(Greek letter sigma) to represent the standard deviation. That symbol indicates that the standard deviation of the entire population is known. Visually, the standard deviation can be seen as the distance from the mean to an inflection point. An inflection point is located on a curve at the point where the curve changes from concave up (bent up) to concave down (bent down) or vice versa. On the normal curve in Figure 6, the mean is 23 and the standard deviation is 3.
[Figure6]
The Empirical Rule (68-95-99.7 Rule)
It is now time to make use of some of the special characteristics of the normal curve. As mentioned earlier, 100% of all results fall somewhere under the normal curve. It turns out that approximately 68% of all results are within one standard deviation of the mean, 95% of all results are within 2 standard deviations of the mean, and 99.7% of all results land within three standard deviations of the mean. These percentages are illustrated in the graphic below.
[Figure8]
The numbers on the bottom represent the number of standard deviations from the mean. For example, the
marks the point one standard deviation below the mean. Some simple addition and subtraction allows us to be very specific in the percents of the data that land in the sections of the normal curve as shown below.
[Figure9]
Can you see the 68-95-99.7 rule here?
Example 1
Suppose the mathematics portion of the SAT exam is normally distributed with a mean of 500 and a standard deviation of 100.
a) Sketch a normal curve for this situation marking the mean and the values 1, 2, and 3 standard deviations above and below the mean.
b) Using the 68-95-99.7 rule, approximately what percent of students scored at least 600 on this test?
c) Between approximately which two scores did the middle 95% of students score?
d) Suppose that 4600 students take the exam this month. How many of those students should we expect to obtain a score of at least 700?
Solution
a)
b) We know that 50% of all results are below the 500 marker and that 34% of all results land between 500 and 600. We have used up 50% + 34% = 84% of all results. This tells us that 100% - 84% = 16% of all students scored above 600 on the mathematics portion of the SAT.
c) The middle 95% of all students scored within 2 standard deviations of the mean or between 300 and 700.
d) A score of 700 marks the boundary two standard deviations above the mean such that only 2.5% of all test takers will score at least 700. 2.5% of 4600 is 115 students.
Example 2
The normal curve below represents the number of races that a typical race horse will run in one calendar year.
[Figure11]
a) Approximately what percent of racehorses will run between 5 and 11 races during a calendar year?
b) What are the values of the mean and standard deviation for the distribution shown?
Solution
a) Add 13.5% + 34% + 34% to get 81.5% so 81.5% of racehorses run between 5 and 11 races per year.
b) The mean racehorse will run 9 races per year with a standard deviation of 2 races.
Let's now go back and try to think about our original question "What is normal?" In mathematics, the middle 95% is often (but not always) considered our 'normal' group. For example, suppose the ACT exam is normally distributed with a mean of 18 and a standard deviation of 6. Our 'normal' group would be comprised of those students who scored anywhere within two standard deviations of the mean or from 6 to 30 on the exam. A student who scored 31 or higher on the exam would have achieved an exceptional score. We might say that this student was not normal with regards to their ACT score.
Normal distributions are not as common as you might think. What if we measured the lengths of shoes of teenagers? Many students think that this would be normal when in fact, there are a couple of contributing factors that might tip us off that the situation may not be normal. First of all, teenagers encompass a large population. Most of those who are in their upper teen years have finished growing into their adult shoe size length whereas many of the younger teens are still growing. This would tend to give us a slightly larger percentage of smaller shoe lengths than we might expect from a normal distribution. In addition, teenagers include males and females. This may lead to us seeing a situation which might be bi-modal. We might expect to see a peak at the most common male lengths and at the most common female lengths.
Example 3
Which situation below is most likely to produce a normal distribution?
a) The heights of all adults.
b) The wingspans of three year-old American eagles.
c) The number of teeth that Americans adults have.
Solution
The correct answer is b). Three year-old American eagles have an average wingspan and we would expect that there are quite a few eagles at that wingspan or very close to it. As we move further and further up and down from that average, we would expect to see fewer and fewer eagles with those wingspans. Answer a) could be ruled out quickly in that the heights here do not specify a particular group. For example, this data would include males and females. Answer c) is out because the vast majority of American adults have 32 teeth. As we move away from 32, there are some people with fewer teeth due to a variety of reasons but there are virtually no people with more than 32 teeth. We should see symmetrical results if this was a normal distribution.