Variability measures are also crucial in describing data distribution. They show how spread-out data points are and how far away they are from the mean.
Some of the basic questions during a statistics interview might require you to explain the meaning and usage of variability measures:
The variance: measures the average squared distance of data points from the mean. A small variance corresponds to a narrow spread of the values, while a big variance implies that data points are far from the mean.
The standard deviation: is the square root of the variance. It shows the amount of variation of values in a dataset.
The range: is the difference between the maximum and minimum data value. It is a good indicator of variability when there are no outliers in a dataset, but when there are, it can be misleading.
The interquartile range (IQR): measures the spread of the middle part of a dataset. It’s essentially the difference between the third and the first quartile.
A box plot or boxplot (also known as a box and whisker plot) is a type of chart often used in explanatory data analysis.
Box plots visually show the distribution of numerical data and skewness by displaying the data quartiles (or percentiles) and averages. Box plots show the five-number summary of a set of data including:
The minimum score: the lowest score, excluding outliers (shown at the end of the left whisker).
The first (lower) quartile: twenty-five percent of scores fall below the lower quartile value (also known as the first quartile).
The median: the median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value, and half are less.
The third (upper) quartile: seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value.
The maximum score: the highest score, excluding outliers (shown at the end of the right whisker).
The Interquartile Range (or IQR): The box plot shows the middle 50% of scores (i.e., the range between the 25th and 75th percentile).
The way the box-plot graph relates to the previous elements is presented in the next Figure.