In science, the meaning of these terms is really important and not inter-changeable. Our experimental design should be carefully managed to ensure that accurate and precise measurements can be taken, leading to reliable and reproducible results that reflect the population as a whole.
Validity: A valid conclusion can only be drawn if all confounding variables have been controlled so that any measured effect is likely to be due to only the independent variable.
Reliability: Results are seen to be reliable if consistent values are obtained in all repeats and in an independent replicate.
Accuracy: Data, or the means of data sets, are close to the true value.
Precision: Measured values are close to each other.
Jamie was attempting the investigation introduced in Task 2 (Investigating the effect of caffeine concentration on Daphnia heart rate). This investigation involves recording the number of times the Daphnia heart beats in 1 minute. The problem is the resting heart rate of a typical Daphnia is about 200 beats/min (Ebert, 2005). Jamie did a bit of Googling and found that typical methodologies involve placing the Daphnia on a microscope slide and counting the number of heart beats in 20s (by looking down the microscope) and then multiplying this number by 3.
He thought, "I surely can't make 200 dots in the shape of an S in 1 min - I will need be more creative here" - Dr McRobbie smiled at his resolve to take this classic protocol and make suitable modifications to improve the accuracy of his measurements - "This kind of attitude will really help him access those tricky marks in the Experimental section of the Project", she mused.
Question: Do you agree with Jamie that it would be difficult to accurately record this rate of heart beats?
In addition, if your treatment (caffeine) potentially raises heart rate, can you be confident in the accuracy of your measurements?
Suggest how you could adapt this simple methodology to improve the accuracy of your measurements. Remember, accuracy is all about recording data that reflects the true mean.
Reference:
Ebert, D. (2005), Ecology, Epidemiology, and Evolution of Parasitism in Daphnia, Bethesda (MD), National Center for Biotechnology Information (US), Chapter 2, ISBN-10: 1-932811-06-0. Available at https://www.ncbi.nlm.nih.gov/books/NBK2042/
A suggested answer is available here
Try the question below - the answer can be found here.
Reference: SQA (2019), CfE Advanced Higher Biology examination paper, section 1, question 16 (page 8), available at https://www.sqa.org.uk/pastpapers/findpastpaper.htm?subject=Biology&level=NAH [accessed on 04.04.20]
You should download R using the following instructions (Ruxton and Stafford, 2015):
Go to the R homepage: http://www.r-project.org
On the left hand panel, just below the title "Download", click on the word "CRAN" to see a page of countries.
Scroll down to the UK and click on one of the options, e.g. https://www.stats.bris.ac.uk/R/.
Click on the version of R appropriate to your computer's operating system (probably Mac or Windows).
Click on the "base" subdirectory.
Click on the link to the R setup program (e.g. "Download R-2.13.2v for windows").
When prompted, save the programme to your computer's hard drive.
Open the folder, click on the "setup" file, agree to everything, select default installation, and say "yes" to a shortcut icon on your desktop.
Click on the desktop icon (a big letter "R") to start using R.
Reference: Ruxton, G.D. and Stafford, J. (2015), Statistics for School Biology Experiments and Advanced Higher Projects, Dunfermline, SSERC.
An important consideration when planning an investigation is "How many times should I repeat this?". It is impractical to measure every individual in a population and so a representative sample of the population should be selected. Ultimately, the mean of your data should reflect the mean of the population as a whole. The extent of the natural variation in the whole population determines the appropriate sample size - more variable populations require a larger sample size.
But how should a sample be selected?
Reference for images: (Smith and Campbell, 2016).
In random sampling, members of the population have an equal chance of being selected.
This is a straight-forward sampling process but it can result in poor representation of the overall population.
Reference for images: (Smith and Campbell, 2016).
In systematic sampling (as shown below), members of a population are selected at regular intervals. This might include a belt transect using quadrats to sample a particular ecosystem.
This type of sampling provides a more representative sample of the overall population but, because not all members of the population have an equal chance of being selected, it can be subject to selection bias.
Reference for images: (Smith and Campbell, 2016).
In stratified sampling, the population is divided into categories that are then sampled proportionally.
This type of sampling is highly representative of the overall population. However, setting up this form of sampling requires the researcher to know the proportions of each group prior to beginning.
Read the following environmental investigation examples and identify whether they used a random, systematic or stratified sampling methodology.
A habitat was to be sampled for vegetation using a quadrat. A 10m x 10m square was marked out on the group using tape. A number generator was used to select a pair of numbers between 1 and 10. This pair of numbers formed an X and Y coordinate within this 10x10m square section, e.g. if numbers 5 and 7 were revealed on the generator, a quadrat would be placed 5m up and 7m along the marked out square. The vegetation found within the quadrat was recorded. This process was repeated 10 times within the area.
A member of the Woodland Trust visited a site and needed to take a proportionate number of observations from each part of the population. She divided the habitat into zones based on the knowledge that 60% of the area was heathland and 40% gorse. As a consequence, 60% of samples were taken within the heathland and 40% of the sample were taken from the gorse area.
The number of barnacles found from the seashore inland was measured using a quadrat. A transect was used to sample every 10m along from the seashore to the sand dune system.
Answers can be found here.
A representative sample is a sample that would be expected to show a similar mean and degree of variation around the mean as the whole population. One of the key skills in designing your investigations is determining the size of your sample so that your final conclusions are reflective of the whole population.
An investigation was carried out by one of my AH Biology students, Rob: "To investigate the effect of age on memory". The student was interested in comparing 2 different age groups (Age 10-18 and Age 60-70), with 10 participants from each age range. However, the student was keen to maintain a gender balance and even distribution of individual participants from the age ranges, i.e. they were keen to avoid 10 boys who were 11 years old from the first group.
To achieve this, Rob constructed an excel spreadsheet with all available participants. He selected 5 males and 5 females from each group, selecting 2x 10 year olds, 2x11 year olds, 2x 14 year old, 2x 16 year olds, 2x 18 years. He did a similar procedure in the older age group.
Please proceed to Task 11 and then continue reading about Rob's investigation.
Using the information in the box above, what type of sampling procedure did Rob apply? Why do you think he did this?
Answers can be found here.
Rob carried out a simple memory test involving a set of 20 cards with images on them. Ten individuals from each age group were involved in the original pilot study (as discussed above) and each individual had to perform the memory test 3 times (each time using a different set of image cards).
There are 2 questions Rob needs to ask himself at this point:
Is the equipment I am using, i.e. the 3 sets of image cards, sufficient and robust enough to produce reproducible data? For example, if individual subjects recall many more images from Set2 compared to Set1, perhaps the choice of images used are more easily remembered compared to Set1 (remember back to Higher Human Biology - perhaps some of the images naturally fall into categories to facilitate memory). Is the general methodology able to generate reliable data?
What is the level of variation across the whole sample of 10 individuals per age group? Is this a suitable sample size?
A boxplot is a really useful tool to visually represent the spread and variation of your data. A fantastic guide has been put together by Graeme Ruxton and Jim Stafford, working through SSERC, on "Statistics for School Biology Experiments and Advanced Higher Projects". This guide includes many of the basics for the coding language required for R. I have put the full reference for this on the main page. I have adapted some of their guidance to support your understanding here.
The box represents the InterQuartile Range (IQR) of the data, i.e. the spread of the middle 50% of the data. The dark line found within the box represents the median - this is the middle number in an ordered list of values, e.g.
5, 6, 7, 8, 9, 10, 11 - for this set of values, the median would be 8 (this is the middle value when written in ascending order).
The top and bottom edges of the box are the third and first quartiles of the data, respectively. This means that, when written in ascending order, the 1st quartile divides the lower 50% of values in half, and the 3rd quartile divides the upper 50% of numbers in halfdat - this is depicted in the diagram below. The interquartile range is simply the 1st quartile substracted from the 3rd quartile (so, in this example below, the IQR = 3).
If you now look back at the boxplot example, you will see that the lower line of the green box aligns to 6.5, the median (dark line within the box) is 8 and the upper line of the green box lines at 9.5 on the y-axis.
Finally, a wee note about "whiskers". The whiskers are used to illustrate variability outside the Interquartile Range (the middle 50% of your data). If the Interquartile Range describes the spread of the middle 50% of your values, then, as stated by Ruxton and Stafford (2015), the whiskers are "describing the 50% of values from the 2 extremes when you order your sample values".
The whisker at the top of the box extends to either:
the data-point furthest from the median that is still 1.5x the IQR from the upper quartile, i.e. in the example we've been looking at tahis would be 1.5x3 = 4.5 units above 9 = 13.5.
the maximum value of the dataset - in our example, this would be 11.
Whichever of these produces the shortest whisker is used - in our example, because 11 is less than 13.5, an upper whisker extending to 11 is used.
The same rule applies for the lower whisker. So, in our example, the lower whisker 1.5x3 = 4.5 --> 6.5 - 4.5 --> whisker extends to 2 OR use the minimum value in the data set (5). A whisker extending to 5 would be shorter so this is used.
Rob has now carried out his memory investigation and his data for the 10-18 year old age group is shown in the table below:
We can use R to look at the variation obtained within each individual to determine the reliability of the researcher's methodology.
This boxplot shows the variation between repeated measurements on each individual (in the 10-18 year old age group). The variation in values obtained from one individual will flag up any variation in our measurement method (i.e. systematic error).
The boxplot shows very little variation in memory tests from each participant and, therefore, we can conclude that the methodology used can produce precise results, i.e. the values obtained for each individual are very close to each other.
Because there is a small degree of variation for each individual, three repeats per individual is sufficient. If, however, a large degree of variation for each individual was observed, a greater number of tests for each participant would have to be employed.
For further guidance, see "Testing the reliability of our instruments" below.
Now, time to reflect on our second question: What is the level of variation across the whole sample of 10 individuals per age group? Is this a suitable sample size? For this, a second boxplot can be used to summarise the variation across the 10 participants (shown below):
In this boxplot, the variation between measurements from different individuals (replication) has been presented - it looks a little different from our original example. Let's talk through it:
The dark line within the green box shows our median (our middle value) - if we order our average values (as below), the middle value is 7.
The 1st quartile is 6, while the 3rd quartile is 7, giving a IQR of 1.
In this example, the 3rd quartile is the same as the median value. In this particular data set, 9 and 10 have been assigned as "outliers" and are depicted as open circles. This is because the upper whisker should be within 1.5xIQR (1.5x1) = 1.5 units above the 3rd quartile (7) --> so the whisker should extend to 8.5. Because this would generate the shortest whisker, compared to the maximum value of 10 in our dataset, these higher values are seen as outliers.
The presence of these outliers within our data produces an "asymmetric" pattern of results and this level of variation perhaps suggests the mean we generated from our investigation may not reflect the true mean of the overall population. Should Rob now continue with the full investigation with more than 10 participants from each age group?
How do we determine the approximate number of repeats required? According to Ruxton and Stafford (2015), we can "determine the approximate number of replicates required to get close to the true value by starting with a small number of replicates, calculating the mean and then adding further replicates and recalculating the mean (a cumulative mean). Once the cumulative mean does not alter, then we probably have sufficient replicates to give a true value".
Rob added a further 2 participants into his study and the cumulative mean was measured. This resulted in a value of 7 over repeated measurements, suggesting this reflected the true value.
Your teacher might now issue you with Learner Check 4 to check your learning of the Topic 3 content.
Sylvie, a student in the AH Biology class, carried out an investigation into the effect of caffeine concentration on Daphnia heart rate (that old favourite back again!). She chose 5 bathing solutions (ranging from 0-4 units of cafffeine). For each caffeine concentration, 5 different Daphnia were used. Because she was a particularly excellent student, she remembered that a Randomised Block Design would be important here.
Her data is shown below:
Produce a boxplot, using R, to show the variation across the 5 Daphnia Sylvie used in the absence of caffeine. Do you think her sample size was appropriate? Explain you answer.
I have included the R coding below and here is a link to my YouTube video to support you here.
Suggested answers can be found here.
Video tutorial on writing the R code for producing a box plot.
Experimental design must reflect the need to obtain reliable data. This involves assessing variation introduced from methodology and from the natural variation in the population being sampled (as discussed above). Repeated measurements must be carefully reviewed to ensure reliability. However, a further layer of rigor is achieved through Independent Replication.
This is carried out to produce independent data sets - overall results can only be considered reliable if they can be achieved consistently. This means that an investigator should carry out the entire investigation again with fresh materials/ new participants on a new day - this requirement of Advanced Higher Biology (which is not the case in AH Chem or Phys) should be considered when choosing your AH Project. These independent data sets should be compared to determine the reliability of the results.
Raymond has been investigating the effect of alcohol concentration on cell membrane permeability. He performed 3 repeats at each alcohol concentration and collected his data. He was so happy when he was finished! Dr McRobbie ruined his day when she reminded him of the "independent replication" rule. "What, I have to do the whole thing again?" - "Why, yes indeed you do, Raymond", she replied in her knowing way.
Raymond, downtrodden, went back to the fridge and collected his scabby old beetroot - again, Dr McRobbie approached him and asked what he was doing. He replied, "I'm doing my independent replicate Miss".
Question: What did Dr McRobbie say next?
Suggested replies are here.
Avaz arrived back in class, all fresh and ready to carry out his independent replicate - Raymond had warned him about Dr McRobbie and the independent replication rules so he was primed and ready to go. He was investigating the potential inhibitory effect of green tea on dopa oxidase activity by measuring the absorbance of the solution - this reaction would normally produce a slightly red product when the enzyme and substrate are incubated together. He used lead nitrate as a positive control (this is a known inhibitor of Dopa Oxidase - but he knows his stuff and sought support from our fabulous Science Technician as his Risk Assessment revealed this was a dangerous substance).
The tables below show the data that Avaz collected from his first experimental run and from his independent replicate.
Avaz wasn't quite sure what he had to do next with his results so Good Old Dr McRobbie showed him this, from the SQA Marking Instructions for the AH Biology Project.
"So, Miss, do I need to produce another table to combine all my data and present an overall average?"
Dr McRobbie said (a little too smugly) "Absolutely, you are completely correct Avaz!".
First task - calculate the average values for the missing cells in Avaz's table.
Second task - generate a table that shows Avaz's "overall results calculated and presented".
Answers are available here.
As discussed above, variation in experimental results may be due to the reliability of measurement methods and/or inherent variation in the specimens. The reliability of measuring instruments or procedures can be determined by repeated measurements or readings of an individual datum point. The variation observed indicates the precision of the measurement instrument or procedure but not necessarily its accuracy.
SSERC provided a simple protocol for assessing the accuracy, reliability and precision of the Mystrica Colorimeter, of which there are at least 9 placed within each local authority.
Method:
Switch on the colorimeter and select the red diode (R) using the RGB button. Select A for absorbance using the A/T button.
Place an empty cuvette into the sample holder and pressed "CAL" to zero the colorimeter.
Cut 5 pieces of neutral density filter each with dimensions 3cm x 0.8cm.
Using tweezers, place one piece of neutral density filter into the cuvette and record the absorbance.
Add consecutive pieces of neutral density filter, recording the absorbance after each addition.
Repeat these steps using the blue diode.
Plot your results.
The data table and graph above show that as successive pieces of standard neutral density filter are added to the colorimeter, the absorbance value increases by a regular value. The linear plot shows that the instrument is:
precise, i.e. repeat measurements are very similar.
accurate, i.e. the mean value of measurements is very close to the true mean.
Your teacher might now issue you with Learner Check 5 to check your learning of the Topic 3 content so far.