Population Distributions

Probability Distributions

In this book, we have already talked about discrete probability distributions. For example, a histogram is a discrete probability distribution because it is a distribution of discrete, measured values. We also used frequency tables to display discrete probability distributions. In this section, we shift our focus from discrete to continuous random variables. We start by looking at the probability distribution of a discrete random variable and use it to introduce our first example of a probability distribution for a continuous random variable.


EXAMPLE 1


Let X = the shoe size of an adult male. X is a discrete random variable, since shoe sizes can only be whole and half number values, nothing in between. For this example we will consider shoe sizes from 6.5 to 15.5. So the possible values of X are 6.5, 7.0, 7.5, 8.0, and so on, up to and including 15.5. Here is the probability table for X:



And here is the probability histogram that corresponds to the table.

As is always the case for probability histograms, the area of the rectangle centered above each value is equal to the corresponding probability. For example, in the preceding table, we see that the probability for X = 12 is 0.107.

In the probability histogram, the rectangle centered above 12 has area = 0.107.

We write this probability as P(X = 12) = 0.107.

And finally, as is the case for all probability histograms, because the sum of the probabilities of all possible outcomes must add up to 1, the sums of the areas of all of the rectangles shown must also add up to 1.

Now we can find the probability of shoe size taking a value in any interval just by finding the area of the rectangles over that interval. For instance, the area of the rectangles up to and including 9 shows the probability of having a shoe size less than or equal to 9.


We can find this probability (area) from the table by adding together the probabilities for shoe sizes 6.5, 7.0, 7.5, 8.0, 8.5 and 9. Here is that calculation:

0.001 + 0.003 + 0.007 + 0.018 + 0.034 + 0.054 = 0.117Total area of the six green rectangles = 0.117 = probability of shoe size less than or equal to 9. We write this probability as P (X ≤ 9) = 0.117.

Recall that for a discrete random variable like shoe size, the probability is affected by whether or not we include the end point of the interval. For example, the area – and corresponding probability – is reduced if we consider only shoe sizes strictly less than 9:

This time when we add the probabilities from the table, we exclude the probability for shoe size 9 and just add together the probabilities for shoe sizes 6.5, 7.0, 7.5, 8.0, and 8.5:

0.001 + 0.003 + 0.007 + 0.018 + 0.034 = 0.063

Total area of the five rectangles in green = 0.063 = probability of shoe size less than 9. We write this probability as

P(X < 9) = 0.063

Transition to Continuous Random Variables

Now we will make the transition from discrete to continuous random variables. Instead of shoe size, let’s think about foot length. Unlike shoe size, this variable is not limited to distinct, separate values, because foot lengths can take any value over a continuous range of possibilities. In other words, foot length, unlike shoe size, can be measured as precisely as we want to measure it. For example, we can measure foot length to the nearest inch, the nearest half inch, the nearest quarter of an inch, the nearest tenth of an inch, etc. Therefore, foot length is a continuous random variable.

What happens to the probability histogram when we measure foot length with more precision? When we increase the precision of the measurement, we will have a larger number of bins in our histogram. This makes sense because each bin contains measurements that fall within a smaller interval of values. For example, if we measure foot lengths in inches, one bin will contain measurements from 6-inches up to 7-inches. But if we measure foot lengths to the nearest half-inch, then we now have two bins: one bin with lengths from 6 up to 6.5-inches and the next bin with lengths from 6.5 up to 7-inches.

You can use the following simulation to see what happens to the probability histogram as the width of intervals decrease. Change the interval width by clicking on 0.5 in., 0.25 in., or 0.1 in.

Click here to open this simulation in its own window.

At the bottom of the simulation is an option to add a curve. This curve is generated by a mathematical formula to fit the shape of the probability histogram. Check “Show curve” and click through the different bin widths. Notice that as the width of the intervals gets smaller, the probability histogram gets closer to this curve. More specifically, the area in the histogram’s rectangles more closely approximates the area under the curve. If we continue to reduce the size of the intervals, the curve becomes a better and better way to estimate the probability histogram. We’ll use smooth curves like this one to represent the probability distributions of continuous random variables.

Previously, we examined the probability distribution for foot length. For foot length and for all other continuous random variables, the probability distribution can be approximated by a smooth curve called a probability density curve.

Recall that these smooth curves are mathematical models. We use a mathematical model to describe a probability distribution so that we can use technology and the equation of this model to estimate probabilities. (As we mentioned earlier, we do not study the equation for this curve in this course, but every statistical package uses this equation, and the area under the corresponding curve, to estimate probabilities.)


As in a probability histogram, the total area under the density curve equals 1, and the curve represents probabilities by area. To find the probability that X is in an interval, find the area above the interval and below the density curve.

For example, if X is foot length, let’s find P(10 < X < 12), the probability that a randomly chosen male has a foot length anywhere between 10 and 12 inches. This probability is the area above the interval 10 < X < 12 and below the curve. We shaded this area with green in the following graph.


If, for example, we are interested in P(X < 9), the probability that a randomly chosen male has a foot length of less than 9 inches, we have to find the area shaded in green below:

Comments

  1. We have seen that for a discrete random variable like shoe size, P(X < 9) and P(X ≤ 9) have different values. In other words, including the endpoint of the interval changes the probability. In contrast, for a continuous random variable like foot length, the probability of a foot length of less than or equal to 9 will be the same as the probability of a foot length of strictly less than 9. In other words, P(X < 9) = P(X ≤ 9). Visually, in terms of our density curve, the area under the curve up to and including a certain point is the same as the area up to and excluding the point. This is because there is no area over a single point. There are infinitely many possible values for a continuous random variable, so technically the probability of any single value occurring is zero!

  2. It should be clear now why the total area under any probability density curve must be 1. The total area under the curve represents P (X gets a value in the interval of its possible values). Clearly, according to the rules of probability, this must be 1, or always true.

  3. Density curves, like probability histograms, may have any shape imaginable as long as the total area underneath the curve is 1. Each density curve is a mathematical model with an equation that is used to find the area underneath the curve.

Let’s Summarize

The probability distribution of a continuous random variable is represented by a probability density curve. The probability that X has a value in any interval of interest is the area above this interval and below the density curve.


References:

  1. https://courses.lumenlearning.com/introstats1/chapter/the-terminology-of-probability/

CC LICENSED CONTENT, SHARED PREVIOUSLY