Analysis of quantitative data: calculating measures of central tendency (mean, median, mode), data tables (frequency tables and summary tables), graphical presentation (bar chart, histogram), measures of dispersion (range and standard deviation), percentages, ratios, fractions.
Normal and skewed distribution.
Descriptive statistics are analyses of quantitative [numerical] data that summarise patterns, and therefore often save readers from trawling through large amounts of data to understand research findings.
Measures of central tendency are examples of descriptive data statistics that depict an overall ‘central’ trend of a set of data. There are three key measures:
The most frequently occurring number in a data set.
e.g. the mode of 1, 2, 3, 4, 4, 5 is 4
The middle score when the data are in numerical order.
e.g. the median of numbers 1, 2, 3, 4, 5 would be 3),
If there are an even number of data then this is the sum of the two middle numbers, divided by two (e.g. the median of 1, 2, 3, 4, would be [2+3] / 2 = 2.5)
The mean is a measure of central tendency that is calculated by adding all of the scores in a data set and dividing by the total number of scores. It is the most sensitive measure of central tendency as it includes all of the scores in its calculation. However, it is easily distorted by extreme values.
e.g. the mean of 1, 2, 3, 4, 5, 6 would be calculated:
[1+2+3+4+5+6] / 6 = 3.5
The mean takes all numbers of a data set into account, which could be deemed as a strength of the measure, but this also means that it is susceptible to skewing of the final calculated figure if the data features extreme values.
For example, in a data set – 1, 2, 3, 4, 19 – the mean would be 5.8
i.e. [1+2+3+4+19] / 2)
which could be argued as unrepresentative as most values in the data set are smaller than 5.8.
From this perspective, the median (i.e. 3) might be a better descriptive statistic to report, as it yields a value that is unaffected by any extreme values.
The mode can be useful by showing the most frequent value(s) in a data set, but it is of little use where the data set includes many different values of the same frequency, i.e. there are many modes.
For example, in a data set –1, 2, 3, 4, 19 – all 5 values are the mode, which does not summarise the data at all.
A bar chart is used to show frequency data for discrete (separate) variables. For example, bar charts are used to plot mean scores for conditions A & B separately. The bars should not be touching and the variables on each axis should be operationalised. The IV should go on the X-axis and the DV on the Y-axis. The title should include clear reference to both the IV and DV.
Histograms are a type of graph used for continuous data (e.g. age). There should be no space between the bars, because the data is continuous (e.g. 1-9, 10-19, 20-29, etc.)
The range is the difference between your highest and lowest values. It is simple to calculate, however it does not take central values of a data set into account, and so it can be skewed by extremely high or low values.
The standard deviation is a measure which shows to what extent the values in a data set deviate from the mean. It is calculated using all of the values, and so is arguably more representative than the range.
Standard deviation is calculated using the formula below:
For each value in the data set (x), subtract the mean (x̄), and then square the result. Then find the sum of all the resulting values. Next, this sum is divided by the number of values in the data set (N), then the square root of the resulting number is found.
Unlike the range (using only the highest and lowest values), the full data set is used to calculate standard deviation, so more data is taken into account. It can also allow relatively accurate analysis in relation to the mean; roughly 68% of the values in a normally distributed data set are found one standard deviation either side of the mean, roughly 95% within two standard deviations, and 99.7% within three. However, it is more complex to work out than the range, and is less helpful in understanding data that is not normally distributed.
Percentages are a way of summarising nominal level data (frequencies in categories). A percentage is a portion of a whole expressed as a number between 0 and 100 (instead of as a fraction).
A ratio compares two quantities, and translates into a statement such as ‘for every 3 black there are 2 red’.
To simplify a ratio, divide all the numbers in the ratio by their highest common factor.
A fraction is a quantity that is not a whole number. The top number of a fraction (the numerator) indicates the number of parts. The bottom number of a fraction (the denominator) indicates the number of parts the whole is divided into.
A normal distribution is an arrangement of data that is symmetrical and forms a bell-shaped pattern where the mean, median and/or mode falls in the centre at the highest peak.
The shape formed by the bars in a histogram is known as the distribution of the data. A histogram shows how the data are distributed across the intervals. A distribution can be symmetrical (‘normal distribution’), or have a positive or negative skew.
A skewed distribution is one where frequency data is not spread evenly (i.e. normally distributed); the data is clustered at one end. Data that is positively skewed has a long tail that extends to the right. Data that is negatively skewed have a long tail that extends to the left. As a general rule, when data is skewed to the right (positively skewed), the mean will be greater than the median and when data is skewed to the left (negatively skewed), the median will typically be greater than the mean.
Explain one weakness of the mean as a measure of central tendency. (2) June 2018 P2
Explain one weakness of Cherry using the mean as a measure of central tendency to analyse her results. (2) January 2020
Calculate mode and mean. (2) June 2019 P2
Calculate the median score for male participants. (1) October 2016
Explain why the median is an appropriate measure of central tendency for the data in this study. (2) October 2016
Define what is meant by the term ‘mode’ as a measure of central tendency. (1) June 2017
Calculate the median score for Group 1 using the data in Table 1. (1) October 2017
Calculate the mean score for males in Condition B using the data in Table 1. (1) January 2018
Explain one strength of the median as a measure of central tendency. (2) June 2018
Calculate the mode score for Condition B using the data in Table 2. (1) June 2018
Calculate the mode for coercive power using the data in Table 1. (1) January 2019
Calculate the mean consumption of fruit for the high school after Doctor Foster's talk. (1) June 2019
Calculate the mean scores for Condition A and Condition B and complete Table 1 with your answers. (2) October 2019
Arissa asked her teachers how many brothers, sisters and children they have. The results are shown in Table 1. Calculate the standard deviation by completing Table 1 below. You must give your answer to two decimal places. (4) June 2017
Calculate the standard deviation to two decimal places for the data gathered by Tobias by completing Table 1. You must show your calculations. (4) October 2019 P2
Calculate, using the information given in Table 1, the standard deviation for the number of nurses who followed the instructions. (2) January 2017
Explain why standard deviation may be an appropriate measure of dispersion that Manon could use to analyse her data. (2) January 2018
Calculate the standard deviation for Condition B using Table 2 below. (2) October 2018
Give Marco’s standard deviation result for legitimate power to three significant figures. (1) January 2019
Calculate the average number of aggressive acts carried out by 10-year-olds as a percentage of the total number of aggressive acts. You must give your answer to two decimal places. (1) January 2018 P2
Calculate range, percentage and ratio. (1 + 1 + 1) June 2019 P2
Calculate the correct percentages and complete Table 3 with your answers. (2) June 2016
Calculate the percentage of food items Mahmood could remember when he arrived at the shop. Express your answer to two decimal places. (1) January 2017
Calculate the percentage of Helen’s sample who were male. (1) January 2018
Calculate the percentage of participants who recalled the 10th word on the list using the data in Figure 1. (1) January 2019
Calculate the percentage of students in Condition A who did not attend the meeting. (1) January 2020
Calculate the fraction of the college student population Philippa used in her memory investigation. (1) October 2018
Calculate the fraction of Helen’s sample who were female. (1) January 2018
Calculate how many females took part if there were 56 males. (1) June 2017
Calculate the ratio of features recalled for the familiar to unfamiliar building. Express this ratio in its lowest form. (1) June 2017
Calculate the ratio of minutes slept for 1- to 5-year olds compared to minutes slept for 6- to 10-year olds. You must express this ratio in its lowest form. (1) June 2018 P2
Interpret the results of this experiment, using the data from Table 1. (3) June 2018 P2
Draw a suitable graph to represent the data shown in Table 2. (3) June 2018 P2
Draw a bar chart to show the range for Condition A and Condition B in this experiment. (3) June 2017
Draw an appropriate graph to show the median scores for the data shown in Table 1. (3) June 2018
State one conclusion that could be made from the graph shown in figure 1. (2) June 2019
Draw an appropriate graph to represent the mean time for condition A and the mean time for condition B in Zulikhat's experiment. (3) June 2019
Describe what a skewed distribution means in relation to Tobias’ investigation. (2) October 2019 P2
Describe what is meant by the term ‘normal distribution’. (2) October 2018