Mean, median, and mode are all measures of central tendency in statistics. Measures of central tendency describe what is “average” within a distribution of data. Most of the time, when we talk about an “average” we are referring to the mean, but the median and mode are also a type of average.
In this sub-module, we’ll learn how to calculate each measure and when or why each measure is used.
The mean of a data set is calculated by adding up all of the numbers in the set and dividing the resulting sum by the number of numbers in the set. Let’s find the mean of the following set of numbers:
80, 87, 92, 94, 81
The median of a set set is calculated by ordering the numbers (smallest to largest or largest to smallest) and selecting the middle element (number) of the set. If the data set has an odd number of elements, the median is simply that central element. If the data set has an even number of elements, we calculate the mean of the two most central numbers.
Let’s find the median of the previous data set:
80, 87, 92, 94, 81
Let’s calculate the median of a data set with an even number of elements:
120, 164, 142, 154, 176, 132
The mode is the value that occurs the most frequently in a given data set. Let’s consider a few different data sets:
1, 4, 6, 8, 9, 9, 11, 12
1, 1, 1, 4, 5, 6, 6, 6, 7, 7, 9, 14
1, 4, 5, 6, 8, 9, 11, 15
The mode of data set 1 is 9 because the value 9 is repeated twice, and every other value occurs only once in the set. Data set 2 is multimodal, meaning it has more than one mode. The modes of data set 2 are 1 and 6 because they each appear 3 times in the set, while every other value only occurs once or twice. Data set 3 has no mode, as no value repeats.
It is sometimes difficult to decide whether to use the median or the mean. One benefit of the mean is that all values are taken included in the calculation. However, one drawback of the mean is that it is more heavily affected by outliers, whereas outliers have less of an effect on the median.
Imagine you have finished grading math tests for your grade 12 functions students, and you want to find the class average.
We see here that the mean is 3.6 points away from the median. Looking at the data set, we see that the second lowest mark is 76%, while the lowest is 43%. Here, the outlier grade of 43% has a greater effect on the mean, while it has virtually no effect on the median. Even if the 43% was somehow a 25%, the median would remain the same but we would see an even greater downward effect on the median.
As such, the median is preferred over the mean when there are significant outliers. A real-life example of median vs. mean is the depiction of “average” income in the media. The median income is much more representative of the true “average” household incomes, as the mean can be pulled up by a few particularly high earners or potentially pulled down by workers working part-time if they are included in the calculation.
The mode, on the other hand, has an interesting use in that it is the only measure of the three that can be used with categorical data, like the most common type of pet of students in a classroom or students’ favorite pizza topping. By using the mode, you are able to chose a categorical element such as “double cheese”, to describe the preferences of a class.
(1) Find the mean, median, and mode for the following data set.
(2) For each of the following data sets, indicate whether the mean, median, or mode would be most appropriate. Provide a short justification for your choice.
(3) The mean of the following data set is 46.78. Find the value of the measure x in the set.
*See solutions to provided practice problems on Solutions Page*