Back to Five Number Summary
Boxplots
Part 1
1.1 What are Boxplots?
According to the Statistics Canada (2013), Boxplot [also called whisker plot] refers to a graph-display that presents information from a five-number summary. It is used to depict a distribution graphically. The box is established and drawn around the Median with the lower and upper quartiles (Q1, and Q3). The interquartile range (IQR) is computed by Q3-Q1. Black et al (2010) gives information that the IQR contains the middle 50% of data and should equal the length of the box.
In terms of measurement, the Penn State Eberly College of Science affirms that one of the most important uses of boxplots is to compare two or more samples of one measurement variable.
Boxplots can be illustrated through both a horizontal or vertical orientation.
In a picture, the boxplot is a simple way of representing statistical data on a plot in which a rectangle is drawn to represent the first [lower], second [Median], and third [upper] quartiles. A line inside the box indicates the median value.
Hamilton (2013) provides information that at a glance, Boxplots convey information about centre, spread, symmetry and outliers of a set of data. The boxplot shows the skew of distributions of observations and five high outliers. In addition, The box in a boxplot extends from approximate first quartile (Q1) to third quartile (Q3), and a distance between Q1 and Q3 called the Interquartile range (IQR). See Figure 1
Figure 1 - The Boxplot of a set of data
According to (2012), outliers refer to a scientific term describing "things or phenomena that lie outside normal experience“.
Regarding the notion of outliers within boxplots, Hamilton (2013) focuses on observations of more than 1.5 of IQR that beyond the first or third quartile are plotted as individual points.
In a statement, the College of Saint Benedict Saint John's University (n.d.) says that "not uncommonly real datasets will display surprisingly high maximums or surprisingly low minimums called outliers". See Figure 2
Figure 2-Outliers
1.2 Why do Boxplots matter?
A box and whisker plot used to describe a distribution of data. Considering as a comparison tool, it is believed that the boxplot is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set (Black et al, 2013). In particular, boxplots are very useful when large numbers of observations are involved and when two or more data sets are being compared (Statistics Canada, 2013).
Similarly, Hamilton (2013) specifies that one of the most common applications for boxplots involves comparing the distribution of one variable over categories of a second. Because of the centre, spread and overall range are immediately apparent, boxplots are the ideal tool for comparing distributions of data.
In addition, on its information-rich webpages, the College of Saint Benedict Saint John's University (n.d.) clearly explains that the simplest possible box plot displays the full range of variation of five number summary [min, Q1, median, Q3, max), and the likely range of variation (the IQR).
Further in practice, Caragea (2009) provides information to recognise and understand of three quartiles that in general:
for asymmetric distribution: Q1 and Q3 are about equally apart from Median
for a skewed to the right distribution: Q3 will be further away from Median than Q1 (as well as Min and Max)
for a skewed to the left distribution: Q1 will be further away from Median than Q3 (as well as Min and Max)
1.3 How are Boxplots used in Practice?
Boxplots are useful for comparing multiple distributions and the main purpose of using boxplots is to examine and/or compare distribution/spread in order to understand the 'attitude' or 'behaviour' of one or multiple datasets. The plot is constructed by using a box to represent the middle 50% of the data and lines to indicate the remaining 50%.
According to (2016), the standard kind of boxplots shows medians, quartiles and some information in the tails of the distribution they are used for comparisons of means in a context of variations of one or more kinds.
Similarly, it is believed that for observations in a symmetric distribution, Box plots give an indication of the symmetry or skewness of a distribution, Q1 and Q3 are equidistant from the median (Henry County School, n.d.). Also, it is advised that because of a regular box plot conceals outliers, the modified box plot which plots outliers as isolated points and shows more detail will be used. Hence, a modified box plot is a graph of the five-number summary with outliers plotted individually.
In addition, regarding its portions of motivation attitude, the box and whiskers divide the illustration into four parts. Each of these four parts represents the same portion of motivation attitude. Minnesota State University Moorhead (n.d.) advises that the box portion of the illustration gives us more detailed information that the middle bar in each box obviously shows the median score. Therefore, the median scores in value between boxes have been compared visually and easily.
In a comparison of multiple datasets, regarding the inter-quartile range (IQR), it measures the width of the interval that contains the middle 50% of the data. The University of California@Berkeley (n.d.) indicates that the IQR is not sensitive to the extreme values of the list of distribution [the outsiders of its inter-range of 75th percentile minus 25th percentile such as whiskers and outliers] and it is zero if (at least) the middle 50% of the values are equal. The median and the IQR are visually shown by the divided line and length [short or tall] of each box that they are telling the skewness and tendency of the distribution.
In practice, when considering each box, Levine et al (2009, p88) indicate the relationships between the five number summary and three types of data distribution [called left skewed, Symmetrical, and right skewed] as below.
Three observations for comparisons
Min to Median vs Median to Max
-If the distance from the smallest figure [Min] to the Median is greater than the distance from the Median to the biggest figure [Max], then distribution is left-skewed.
-If the distance from the smallest figure [Min] to the Median is less than the distance from the Median to the biggest figure [Max], then distribution is right-skewed.
-If both distances are the same, then distribution is Symmetrical
Min to Q1 vs Q3 to Max
-If the distance from the smallest figure [Min] to the first quartile [Q1] is greater than the distance from the third quartile [Q3] to the biggest figure [Max], then distribution is left-skewed.
-If the distance from the smallest figure [Min] to the first quartile [Q1] is less than the distance from the third quartile [Q3] to the biggest figure [Max], then distribution is right-skewed.
-If both distances are the same, then distribution is Symmetrical
Q1 to Median vs Median to Q3
-If the distance from the first quartile [Q1] to the Median is greater than the distance from the Median to the third quartile [Q3], then distribution is left-skewed.
-If the distance from the first quartile [Q1] to the Median is less than the distance from the Median to the third quartile [Q3], then distribution is right-skewed.
-If both distances are the same, then distribution is Symmetrical
References
College of Saint Benedict Saint John's University (n.d.). Measures of Location and Spread. Retrieved 26/4/2016
from http://www.stat.berkeley.edu/~stark/SticiGui/Text/location.htm
Black, K.; Asafu-Adjaye, J.; Khan, N.; King, G.;Perea, N.; Sherwood, C.; Verma, R. ; Wasimi, S. (2013). Australian business statistics. published by John Wiley & Sons.
Black, K.; Asafu-Adjaye, J.; Khan, N.; Pereea, N.; Edwards, P.; Harris, M. (2010). Australian business statistics. published by John Wiley &
Sons.
Henry County School (n.d.). Online chapter 1. Retrieved 23/4/2016
from http://www.henry.k12.ga.us/ugh/apstat/chapternotes/sec1.2.html
Levine, B.; Watson, K.; Turner, J. (2009). Business statistics - Concepts and applications. NSW Australia.Pearson Education Australia.
Minnesota State University Moorhead (n.d.). Five-Number Summary and Box-and-Whisker Plots - Motivation Problem. Retrieved 30/4/2016
from http://web.mnstate.edu/peil/MDEV102/U4/S36/S36_print.html
University of California@Berkeley (n.d.). Online Chapter 4. Retrieved 23/4/2016
from http://www.stat.berkeley.edu/~stark/SticiGui/Text/location.htm
Part 2
How to create a boxplot?
Step 1 - Create a table for boxplots - Which variables you want to compare?
Step 2 - Set up a table for the five-number summary: Min, Q1, Median, Q3 and Max - the values we need to make a boxplot
then work out all figures for these values for the first variable, for example, HSC Business Studies (using Microsoft Excel or other software is OK)
Step 3 - Calculate the differences (we want to see how far away is the quartile 1 from the Min, how far is Median from the quartile 1, how far is the quartile 3 (Q3) from the Median, and far from Max from the quartile 3)
Step 4 - We need to repeat step 2 and 3 for the rest of variables, for example for HSC Mathematics, English (Standard), etc.
Step 5- In Excel 2013, we will process these differences, by highlighting all differences then create graphs for them through using normal function in the Excel (such as Insert, graphs,...) and we will create boxplots and their whiskers from these figures. [Choose a column and vertical graphs - We choose a type of stacked column graph)
We want to illustrate this in a new worksheet, so right-click on this vertical graph, then use function 'move chart' to the new sheet, then it will locate a new sheet in the current Excel. We can name this sheet, for example in the following illustration as 'Lydia Teaching' then click OK. From here, we will work on this new screen with a chosen vertical graph.
Step 6 - In old versions of Excel, we need to Switch Rows and Columns for data to display the Five-Number Summary on each box/column. But in the new version Excel 2013, the column graph for boxplots is automatically displayed.
Step 7 - The blue values at the bottom are outliers. We do not need them for our box, so click on the first, then right click (Format Data Series) - click on the Fill - then 'no fill', and we do not need the box down at the bottom.
Step 8 - At the boxes at the second bottom, these they are representing values from the Minimum values to the Q1, so these will be whiskers. Because we do not want boxes for whiskers, so click on the 'no fill'. In the design, we need to add a chart element. So under the chart tools, we 'Add chart Element', then down the screen, we click on the 'Error Bars', from there, there is a small box down, click on the 'More Error Bars Options'. On the right screen, click on the Graph Error Bar Options, then click on the 'Minus' Error Bar, then down to the Error Bar 'Percentage' and type in '100%' and hit ENTER, so we have all Whiskers for all boxes. These are boxes for the quartile 1
Step 9 - The next boxes from the bottom are figures from Q1 to Median. Leave these boxes there.
Step 10- Move to boxes from the Median to Q3 - We can change the colour by pressing the right click, change colour to what we want, for example, change to red colour
Step 11 - Now we will do the Error Bars for the top boxes of data from Q3 to Max. These are whiskers, so we do not need boxes. Click 'no fill' and repeat Step 8 to find Error Bars. Go to the 'Design' then at the top left, repeat step 8 by clicking to the 'Add Chart Element'
If you want to change colour for the box, for example, I will change to the green as below. Then we have completed out Boxplots for comparing the Five Number Summary of different variables
2.1-YouTube-Boxplots in Excel 2013 - Making a Boxplot on Excel 2013 - 14'.30"
Posted by Lucas Burge
2.2-YouTube-Boxplots in Excel 2013 -
Posted by
3-How to Create a Box and Whisker Plot in Excel 2010 -
Posted by
Information of the Portland State University: