Descriptive statistics

Descriptive statistics are statistical tools used in Business Management to summarize and present statistical data in a user-friendly way. The purpose is to enable managers and decision makers to interpret and analyse the data more easily and to support them in making non-biased (rational), objective, evidence-based, and well-informed decisions. Examples of descriptive statistics include the use of the following methods:

Averages, i.e., mean, mode, and median measures of the average values in a data set
Graphical tools, such as bar charts, pie charts, and infographics to visually represent the data, and
Statistical measures of dispersion or spread of numbers in a data set, such as quartiles and standard deviation.

Mean Average

The mean (also known as the arithmetic mean) is the main mathematical method used to calculate the average value in a data set. The mean calculates the sum of all the numbers in a data set divided by the number of items in that data set.

As an example, suppose five customers give the following ratings (out of 10) for the quality of service at a restaurant: 3, 4, 5, 6, and 7. The mean score or rating is calculated by adding all the numbers in the data set and dividing this by the number of items in the data set:

(3 + 4 + 5 + 6 + 7) / 5 = 25 / 5 = 5 out of 10.

As a second example, suppose a business has the following sales revenue figures for the past three years:

2022 = $240,000
2021 = $230,000
2020 = $220,000

The mean average is then calculated as:

($240,000 + $230,000 + $240,000) / 3 months = $230,000 per month.

Mode (Modal average)

The mode (also referred to as the modal average) is the number that occurs more frequently than any other value in a data set. Suppose five workers at a firm have the following number of days off work due to illness: 1, 1, 2, 3, and 4. The mode is the number (of sick days) that appears most frequently in the data set. In this case, 1 appears twice and all other numbers only appear once, so the modal average is 1 day.

Examples of scenarios when calculating the mode is appropriate include:

The most popular item in a firm's product portfolio.
The average waiting times of customers before they are served.
The most frequent age group of a firm’s customers.
The most common category of complaints from customers.

Consider the data below as an example, which shows the sales revenue generated by 12 workers in a given time period.

$3,200

$2,900

$3,000

$3,400

$2,950

$2,900

$2,850

$3,000

$2,900

$3,150

$3,250

$3,300

The mode is the value in the data set that appears the most time. In this case, the mode is $2,900 as it appears three times (the highest frequency in the data set). This means that on average, most of the 12 workers generate $2,900 in sales revenue during the given time period.

Although it is possible to sell more than $2,900 - such as the two workers who both generate $3,000 or the single worker who generated $3,400 - these are not the most frequent (or typical) value for the business. The mode value can be used to as a performance benchmark to measure productivity. In this case, workers who account for more than $2,900 can be acknowledged through financial and/or non-financial methods. For those who are below average, corrective measures might be used to support workers to improve their productivity.

Consider the data set below which shows the IB Diploma scores for candidates in a particular school.

23, 27, 25, 33, 30, 33, 39, 33, 41, 33, 24, 22, 36, 32, 40

To find the mode it is easiest to put the numbers in ascending order as this makes it more straightforward to include and count each item.

22, 23, 24, 25, 27, 30, 32, 33, 33, 33, 36, 39, 40, 41The mode is then the figure that has the highest frequency (appears the most times). In this case, it is 33.

However, the key limitation of the modal average is that there is no mode in the case of bi-modal and multi-modal data sets. Having two mode values in a data set is known as bi-modal. In the example below, there are 12 students who have taken a test, marked out of 25. The bi-modal averages are 12 and 20. Clearly, this has no real meaning given the context.

12,12,15,20,10,12,13,20,11,24,25,20

Having more than two mode values in a data set is known as multi-modal. In both of these cases (bi-modal and multi-modal), the mode cannot be used to locate the centre of the distribution of the test scores. This is also the case when there is no mode value in a data set (non-modal).

Median Average

The median is the average based on the middle value of the data set when a set of data is arranged in order of magnitude, i.e., it splits values in the higher half from those in the lower half. For example, suppose the five directors of a company earn the following salaries per year:

$52,200

$52,900

$53,000

$53,400

$52,950

It is easier to place these values in ascending numerical order to help determine the median (middle) value:

$52,200

$52,900

$52,950

$53,000

$53,400

This means that the average director of the company earns $52,950 per year based on the median average. There are two directors who earn more than the median, and two who earn less. In this particular case, using the median average might be more meaningful than the mean average. The latter gives a figure of $52,890 but we can clearly see that only one of the directors earns less than the mean average salary.
It can be useful for a business to know the median salary of its managers
The benefit of using the median average is that it reduces the significance of outliers. These extreme values can have a large impact on the mean average (arithmetic mean) but only a small impact, if any, on the median average.

Bar Charts

Bar graphs are used to compare figures in a study, such as sales figures during different time periods. They are useful for presenting frequencies and for ease of comparison. The example below shows the sales of three different products (burgers, fries and drinks) for a restaurant chain with 4 outlets. The graph allows the restaurant owners to see at a glance which store has the highest sales for the various products it sells.

Which restaurant has the highest sales revenue for burgers? Restaurant 3

Which restaurant has the highest sales revenue for fries? Restaurant 1

Which restaurant has the highest sales revenue for drinks? Restaurant 3

Which restaurant has sold more than $2,000 of fries? Restaurant 1

Which restaurant(s) sold less than $2,000 of drinks?Restaurant 1, Restaurant 2

Histograms are a type of bar chart, used to show frequency and the range within a data set. The example below shows the distribution of IB grades for students of Business Management in a particular school. The grades (Levels 2 to 7) are shown along the x-axis, with the number of students achieving each of these grades shown on the y-axis. The school strives for all students to achieve at least a Level 4 target grade.

How many students achieved a Level 2 grade? 5

How many students achieved a Level 7 grade? 4

How many students in total did not meet the school's target grade? 5 + 11 = 16

How many students in total achieved a Level 5 or Level 6 grade? 24 + 9 = 33

Pie Charts

Pie charts are used for expressing percentages, such as data on market share or the proportion of participants who chose a particular option in a survey. The pie chart below, for example, shows the percentage of candidates who chose to write their Extended Essay in a particular Diploma Programme subject group.

Which subject group was the most popular for the Extended Essay? Group 3 (Individuals and societies)

Which subject group was the least popular for the Extended Essay? Group 5 (Mathematics)

What percentage of candidates chose to write an Extended Essay in Group 4? 18%

What is the total percentage of students who wrote an essay in Group 1 (Studies in language and literature) or Group 2 (Language acquisition)? 15 + 11 = 26%

Infographics

As the name suggests, an infographic is a visual tool use to present information by combining information and graphics. Many infographics are visually stunning, so help to captivate the attention of people. Infographics will typically include a combination of texts and graphics such as images, charts, and graphs.

The infographic below shows the extent to which different countries have banned the use of plastic carrier bags, which are harmful to the planet due to the nature of plastic wastes.

Quartiles

Quartiles refer to the statistical technique of dividing a data set (such as sales revenue data from different stores or branches of large company or IB examination results from a particular IB World School) into four proportionate parts. Quartiles are used to divide a set of data into four equal parts, so each quartile represents one quarter of the data.

Essentially, quartiles are an extended version of the median average in a data set (the median divides the data, arranged in numerical order, into two equal parts whereas quartiles divide this into four parts). However, the median does not tell managers anything about the spread of the data on either side. Using quartiles enables managers to see the spread of values above and below the mean.

Using quartiles allows managers and decision makers to see the distribution of the items in a data set. In the case of IB examination results for the Diploma Programme, the quartiles might be used to show the following for a particular cohort of candidates:

The first quartile (Q1), also known as the lower quartile, shows the data representing the lowest 25% of the candidates’ examination results. It is the number halfway between the lowest number and the middle number in the data set, i.e., it is the middle value of the lower half of the IB scores. Candidates to the left of Q1 represent those in the bottom 25% for the cohort, as measured by their IB score.
The second quartile (Q2), also known as the median, shows the data representing the middle number, halfway between the lowest IB score and the highest IB score in the data set. Hence, Q2 divides the data set into a lower half and an upper half. It shows the value at which 50% of the candidate's IB scores is below the median score.
The third quartile (Q3), also known as the upper quartile, shows the data representing the number halfway between the middle number and the highest number, i.e., it is the middle value of the upper half of the IB scores. This shows the value at which 75% of the candidate's IB scores is below the median score. Candidates to the right of Q3 represent those in the top 25% for the cohort, as measured by their IB score.
The interquartile range (IQR) is a measure of the spread of the data in the set. It is a measure of variability around the median value in the data. Specifically, it is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) of a dataset. The lower the IQR, the fewer the number of outliers in the data.

Note that larger values of the IQR indicate that the central portion of the data (Q2 and Q3) is spread out further. Conversely, smaller values for the IQR show that the middle values are clustered more tightly.

The use of quartiles puts data into context so that they become more comprehensible. For example, it is not possible to determine if a particular candidate scored 34 points in the Diploma Programme. This will depend of numerous factors such as how this score compares with the school's mean, median, or modal averages. If this score puts the candidate in the fourth (top) quartile in the school, this would be a very impressive result. However, if this result puts the candidate in the first (bottom) quartile, it would be far less impressive.

In a business context, a for-profit company might use quartiles to inform its human resource management. For example, salespeople in the top quartile might be awarded a bonus as part of the company's performance-related pay form of financial motivation. The business might fund further training and/or conduct performance appraisal reviews for salespeople in the bottom quartile.

Standard Deviation

Standard deviation is a statistical measure of the variation (or dispersion) of a set of values. It allows a business to see the extent to which the values or results from a set of data show divergence from the mean (also called the expected value). For example, in sales forecasting (HL only), a high standard deviation indicates that sales fluctuate significantly from the mean average.

The standard deviation from a data set is represented by the symbol "σ". It shows how numbers within a set of data are spread out or distributed, i.e., whether there is a small or large spread of results in the first instance. The greater the standard deviation, the more spread out the numbers are and the greater the variation from the mean. By contrast, a low standard deviation means that the values in the data set tend to be close to the mean.

Managers and entrepreneurs are interested in the spread as it gives an indication of the variation or dispersion (from the mean), which can have a direct impact on their strategic planning and decision making. If the standard deviation is low, this gives businesses greater confidence in their planning. For instance, if a supermarket's average weekly sales revenue is $1.5 million and there is only a small standard deviation, the general manager can go ahead and plan to order stocks and staffing schedules based on this sales figure. By contrast, a large standard deviation would require the business to be far more flexible given the market uncertainties.

As a worked example, consider the data below that shows the weekly sales revenue of five branches of a retailer. Managers can use this data to calculate the standard deviation.

Page updated

Google Sites

Report abuse