Parallel box plots (also called side-by-side box plots) are very useful when two or more numerical data sets need to be compared. The graphs of the parallel box plots are plotted, one parallel to the other, along the same number line. This can be done vertically or horizontally and for as many data sets as needed.
Example 1
The figure shows the distributions of the temperatures for three different cities. By graphing the three box plots along the same axis, it becomes very easy to compare the temperatures of the three cities. What are some conclusions that can be drawn about the temperatures in these three cities?
http://www.mathworksheetscenter.com
Solution
Here are some conclusions, based on the graphs, that might be made. Think S.O.C.C.S! And, be sure to compare the distributions to one another, using statistics to support your observations.
Quartile 1 for City 2 is higher than the quartile 3 in City 1 and the median in City 3. Also, the minimum temperature in City 2 is at about the median for the other two cities.
City 2 is generally warmer than both of the other cities. Cities 1 and 3 have nearly the same median temperature, around 60o to 63o. Whereas, the median temperature in City 2 is around 82o.
City 3 has a much larger range in temperatures (35o to 85o), than City 1 (45o to 75o) or City 2 (62o to 95o). Thus, the temperature in City 1 is the most consistent of the three.
The temperature distributions in all three cities are fairly symmetrical and none have any outliers.
When you are given numerical sets of data for more than one variable and asked to compare them, it will be necessary to construct graphical representations for each data set. In order to compare them to one another the scales must match. When comparing more than one box plot, we construct parallel box plots. When using histograms, we can match the horizontal and vertical scales so that the separate histograms can 'line up'. Dot plots will work the same way as histograms. Such comparisons are also possible when working with stem plots. Two sets of numerical data can simply share the stems in the middle, with one set's 'leaves' going to the right and the other set's 'leaves' going to the left. On both sides of the plot, the 'leaves' will go in numerical order out. Plots like these are called back-to-back stem plots.
Once you have constructed any of these types of comparative graphical representations (on the same scale,) you can make observations about how the data sets are the same and how they are different. Just as we have been doing up to this point, those comparisons should be done in context. The observations made might address the shapes of the distributions and whether or not any outliers are present. It is important to compare the centers of the distributions (means, medians, or modes). And, the spreads of the distributions should also be addressed (ranges, IQRs, or standard deviations).
Example 2
A teacher gave the same physics exam to her two sections of physics. She has been wondering whether the first period and fifth period classes are learning the same amount as one another. She constructed this back-to-back stem plot to compare the test scores for the two different classes.
a) Calculate the five number summary for both classes.
b) Calculate the mean and standard deviation for both classes.
c) Compare the two classes' test scores in context.
http://www.basic-mathematics.com
Solution
a) The numbers in the stem plots are already in order, so these statistics could be found by hand or with a graphing calculator. Five number summary for Class A
(all are points)
Five number summary for Class B
(all are points)
b) These statistics are most efficiently found using a graphing calculator.
Class A mean Class A standard deviation
points points
Class B mean Class B standard deviations points
points points
C) Comparison
Overall, Class A did better on this test than Class B did. Class A's scores on this test are skewed to the left, but Class B's scores are skewed to the right. Neither class has any outliers among the test scores. Class A has a mean score of about 9 points higher (85.7 compared to 76.6) and a median score of 15 points higher (90.5 compared to 75.5). The overall range for the two classes is fairly similar, but the Class A students' scores were less consistent. The ranges (32 and 40), IQRs (14 and 19), and standard deviations (10.1 and 12.6), all show that Class B's scores are less spread out than Class A's scores.
Example 3
An oil company claims that its premium grade gasoline contains an additive that significantly increases gas mileage. They conducted the following experiment in an effort to prove their claim. They selected 15 drivers who all drove the same make, model and year of car. Starting with an empty gas tank, each car was filled with 45L of one of the two types of gasoline (selected in a random order). The driver was asked to drive until the fuel light warning came on. The number of kilometers was recorded and then the car was filled with the other type of gasoline (whichever they had not used yet). The process was repeated and the number of kilometers was again recorded. The results below show the number of kilometers each car traveled.
Display each set of data to explain whether or not the claim made by the oil company is true or false.
Solution
order the data--list the values in order for each set of data
5 # summaries- Determine the five number summary for each set of data separately. Be sure to report your five number summary, whether asked to or not.
box plots --Mark your number axis so that it covers the entire range needed -- smallest minimum to largest maximum (we need 500 to 709 for these two data sets). Then graph each box plot along the same axis, but parallel to each other. This allows for the two data sets to be easily compared to one another.
Key: blue = regular gasoline
gold = premium gasoline
conclusions-- make comparisons by looking for any similarities and differences between the two distributions. Remember your S.O.C.C.S!
Based on this experiment, the number of kilometers that the cars were able to travel on the premium gasoline was greater than the number of kilometers that the same cars were able to travel with the regular gasoline. The median number of kilometers for premium gasoline was 637, compared to 587 for regular gas. The first quartile for premium was higher than the third quartile for regular. Also, 25% of those with the premium gasoline went further than all of those using regular gasoline. The distribution for the regular fuel is slightly skewed to the right, but doesn't have any outliers. However the premium distribution is strongly skewed to the left toward one outlier on the low end (500 km). Based on these results, it appears that the additive in the premium gasoline does improve gas mileage for this make and model of car. Further tests should be done on other types of vehicles.
Example 4
The heights of a group of students are all included in the first histogram. The second histogram only contains the data from the male students and the third is a graph of the heights of only the girls. Explain what these histograms show.
Solution
The range of heights of all students in this group is approximately 20 inches. However, the female heights only range about 11 inches and the male heights only range about 13 inches. The females' height distribution is the most symmetrical of all three. There is one male whose height is a high outlier, but none for the females. The median height for the class is around 70 inches, for males it is slightly higher around 72 inches, and for females it is around 65 inches tall. In general, the female students tend to be shorter than the male students.