Five Number Summary
Part 1
An Understanding of the Five-Number-Summary
According to the California Institute of Technology (Caltech, 2016), the five-number summary provides simultaneously a measure of location and spread. The five numbers are the minimum (Min), the first quartile( Q1), the median (M= Q2), the third quartile (Q3), and the maximum (Max). In details, Caragea (2009) indicates that the measures of the spread of quartiles Q1, Q2, and Q3 describe the position of a specific data value in relation to the rest of the data. These are three numbers that divide the ordered observations into four equally sized groups (i.e. each group contains 25% of all observations).
Similarly, the Statistics Canada (2013) provides the information that a five-number summary is especially useful in descriptive analyses or during the preliminary investigation of a large data set. It is indicated that the five number summary consists of five values. These values are presented together and ordered from lowest to highest: minimum value, lower quartile (Q1), median value (another name of the middle quartile Q2), upper quartile (Q3), maximum value.
In addition with a similar view, the Iowa State University (n.d.) illustrates the five number summary as a set of observations on a single variable that consists of the following statistics:
1-Maximum (max) – the largest observation
2-Upper Quartile (Q3) – a value that separates the largest 25% of the observations from the smallest 75%
3-Median (M or Q2) – a value that separates the largest 50% of the observations from the smallest 50%
4-Lower Quartile (Q1) – a value that separates the largest 75% of the observations from the smallest 25%.
5-Minimum (min) – the smallest observation
To display the five number summary, a boxplot is recommended. The Statistics Canada (2013) suggests that a five-number summary can be represented in a diagram known as a boxplot and whisker plot. In cases where there are more than one data set to analyse, a five-number summary with a corresponding box and whisker plot is constructed for each.
The boxplot (a.k.a. box and whisker diagram) is a standardised way of displaying the distribution of data based on the five-number-summary: minimum [Min], first quartile [Q1], median [M or Q2], third quartile [Q3], and maximum [Max]. (Wikipedia, 2016). The boxplot is illustrated in Figure 1 as follows:
Figure 1-Boxplot to display the five number summary for a set of data
Source of the original picture from a powerful site of
Further, the Iowa State University advises five (5) steps to calculate the Five-Number Summary as follows:
Step 1 - Arrange the observations in order from smallest on the left to largest on the right.
Step 2 - Record the minimum and maximum.
Step 3 - The median is the middle number in the list if there are an odd number of observations and the average of the two middle numbers if there are an even number of observations.
Step 4 - If the number of observations [n] minus one [n-1] is evenly divisible by four [(n-1)/4], then the upper quartile is the median of the observations starting with the median and including all observations to the right.
If the number of observations minus one is not evenly divisible by four, then the upper quartile is the median of the observations to the right of the location of the overall median.
Step 5 - If the number of observations minus one is evenly divisible by four, then the lower quartile is the median of the observations starting with the median and including all observations to the left.
If the number of observations minus one is not evenly divisible by four, the lower quartile is the median of the observations to the left of the location of the overall median.
For example, suppose we wish to compute the five number summary for the following observations:
20 1 23 2 5 9 16 14 21 39
Step 1: Arrange the observations in order from smallest on the left to largest on the right.
1 2 5 9 14 16 20 21 23 39
Step 2 - Record the minimum and maximum: The minimum and maximum are Min=1 and Max=39, respectively
Step 3 - The median is the average of the two middle numbers [M= (14+16)/2=15] because of the number of 10 observations is an even number.
Step 4 - Because the number of observations [10] minus one [10-1=9] is not evenly divisible by four [9/4=2.25], so the upper quartile [Q3] is the median of the observations to the right of 15, i.e.:
16 20 21 23 39
The median of these observations is 21 because 21 is the middle number in this list. Thus, the upper quartile [Q3] is 21.
Step 5 - Likewise the lower quartile [Q1] is the median of the observations to the left of 15, i.e.:
1 2 5 9 14
The median of these observations is 5 because 5 is the middle number in this list. Thus, the lower quartile [Q1] is 5.
The Boxplot of the above five-number-summary is shown in Figure 2
Figure 2 : The Five-Number Summary of the mentioned observations [ 20 1 23 2 5 9 16 14 21 39 ]
Why does the Five-Number Summary matter?
In disclosing the mentioned five values of the five-number-summary, the Statistics Canada (2013) advises that these values have been selected and each value or each variable is to describe a specific part of a data set:
+the median [M] identifies the centre of a data set;
+the upper [Q3] and lower [Q1] quartiles span the middle half of a data set, and
+the highest [Max] and lowest [Min] observations provide additional information about the actual dispersion of the data.
Therefore, the five-number summary is a useful tool to measure the spread of data, in particular in descriptive analyses or during the preliminary investigation of a large data set in order to record accurately its symmetry or asymmetry.
Interestingly, regarding different variables of the Five-Number Summary, Caragea (2009) emphasises that it is important to know how data spread out because this tells us something about the behaviour of a variable. As a result, depending on that behaviour, a decision-making will be considered or made.
How is The Five-Number Summary used in Practice
As mentioned, measuring spread includes the measures of the range, quartiles (Q1, Q2, Q3) and the interquartile range (IQR), variance and standard deviation. But Why spread needs to be measured?
In practice, by measuring all five summary numbers, it is believed that this gives the best overall "picture" of a distribution and it is for a better understanding of skewed distributions or those containing outliers (Henry County Schools, n.d.).
(2013) states that measures of spread describe how similar or varied the set of observed values are for a particular variable (data item).
From the results of the above measurements, they are used for a better of decision-making.
References
Caltech (2016). Summarising Data. Retrieved 9 April 2016 from
Caragea, P. (2009). PowerPoint: Introduction to Business Statistics I.
Henry County Schools (n.d.). Chapter 1. Retrieved 23/4/2016 from http://www.henry.k12.ga.us/ugh/apstat/chapternotes/sec1.2.html
Iowa State University (n.d.). Display and Summary of Data. Retrieved 9 April 2016 from
http://www.public.iastate.edu/~wrstephe/stat496/datasum.pdf
Statistics Canada (2013). Five number summaries. Retrieved 11 April 2016 from http://www.statcan.gc.ca/edu/power-pouvoir/ch12/5214877-eng.htm
Wikipedia (2016). Box plot. Retrieved 9 April 2016 from https://en.wikipedia.org/wiki/Box_plot
Useful links
Part 2
YouTube watching activities: Choose and watch at least three Videos from the following collections:
1.1-Five Number Summary & Boxplot - 5'35
Posted by John Quinn - (25,361 views - 9 April 2016)
1.2-Chart Example in Excel 2016 - 5'21
Posted by ExelIsFun - (5,480 views - 12 April 2016)
2.1-How to find a 5 number summary - 2'.39
Posted by Stephanie Glen - (22,528 views - 9 April 2016)
2.2-How to find a 5 number summary in Excel - 2'. 26
Posted by Stephanie Glen - (10,168 views - 12 April 2016)
3.1-How to calculate interquartile range IQR - 8' 42
Posted by Khan Academy- (94, 901 views - 9 April 2016)
3.2-Statistics - Compute the interquartile range - 2' 28
Posted by MySecretMathTutor- (110,808 views - 9 April 2016)
4.1-Creating a Boxplot in Excel 2016 - 12' 03
Posted by Dr. Todd Grand (9,367 views - 12 April 2016)
4.2-Boxplots in Excel 2013
Posted by Paula Schute (179,394 views - 12 April 2016)
4.3-Create a Simple Boxplot in Excel - 15' 20
Posted by Contextures Inc. (219,803 views - 12 April 2016)
4.4-How to draw a Simple Boxplot in Excel 2010
Posted by Eugene O’Loughlin (228,587 views - 12 April 2016)