Key Area 3

Reporting and critical evaluation of biological research

As you read through this final section, remember that this is all highly relevant to your final AH Biology Project Report. You can find examples of previous candidate AH Biology Projects on the SQA Understanding Standards website and you can use the Marker Commentary to reflect on each candidate's performance. Use the information in this section to fuel your own understanding of what should be written in a final Project Report.

(a) Background information

Scientific reports should contain:

  • An explanatory title - should provide a succinct explanation of the study. Don't be tempted into being "fun" with this like I did in high school (little rebel McRobbie so I was). Your title should contain a clear input (i.e. information about your independent variable), an output (i.e. what are you measuring or trying to find out?) and, where possible, an organism or a context, e.g. Investigation into the effect of caffeine on Daphnia heart rate.

Actually, forget High School, I was still trying to be "fun" for my PhD thesis! Don't do it!

  • An abstract. This includes your aim(s) and a summary of your main findings. The aim must link your independent and dependent variables clearly. Note: In the example of a Title above - "caffeine" is NOT the independent variable - it would likely be something like "caffeine concentration" - you must be much clearer in your aim.

Below, is an example of an abstract I have taken from one of the published projects on the SQA Understanding Standards website. The full project can be found here. It is impossible to assess this without reading the full project (note this should be one of the final things that any researcher writes since it must summarise overall findings) - however, the abstract is found immediately after the contents page and includes an aim and accurately summarises the findings of the investigation.

  • An introduction - this should explain the purpose and context of the study and include the use of several sources, supporting statements, citations and references (written in a standard form - we will discuss more later).

      • The background information you include should be clear, relevant and unambiguous.

      • It should provide any information required to support choices of methods, results and discussion. This includes a discussion around decisions regarding organisms used.

      • An introduction should explain why the study has been carried out and place the study in the context of existing understanding - do your reading! Research the current state of play in the scientific community by engaging with research articles.

      • Key points should be summarised and supporting/contradicting information should be identified.

A summary of what SQA are looking for in this section is shown below:

Tired of reading? Let's review the key elements of Background Information.

Go to SCHOLAR "3.1 Evaluating background information" for some additional questions.

Task 15

Big Davie arrived back in Biology on Thursday, clutching his lab book full of data from his Pilot Study, investigating the effect of alcohol on Daphnia heart rate.

" Morning Miss, did you know alcohol really affected they wee mad Daphnia's heart rates - that can't be good!"

Dr McRobbie replied, in her knowing way, "Yes, Big Davie, you have made a very good observation there".

Today was an important day in the Biology classroom - they were all getting stuck into their background research for the introduction to their projects. Big Davie was a bit stuck with where to start so Dr McRobbie presented him with the summary of SQA's marking instructions (as above). Help Big Davie by writing a bullet-point list of information he will want to include in his "Introduction" section.

Suggested answer is available here.

A note about Citing and Referencing sources

Any body of writing produced by a researcher that reflects wider reading and information gained from others must be referenced fully. Referencing a source involves:

  1. Providing an in-text citation

  2. Providing a full reference list at the end of the document.

There are 2 Referencing systems that SQA have requested you use for your own projects - either the Harvard referencing system or the Vancouver referencing system. I have included some examples from the world of research below but have gone into much more detail about this in the following YouTube video.

Let's take a pit-stop to focus on the key elements of References and Citations at AH Biology level.

This research article has used the Vancouver referencing system.

This involves using numbers, that ascend as your progress through the document, for citations.

The referencing list is then ordered numerically at the end of the document.

By contrast, this research paper has referenced sources of information using the Harvard Referencing System.

For citations, author surnames (maximum of 2) and the year of publication are used.

For references, all sources are listed alphabetically according the the surname of the first author.

Referencing and Citations

(b) Reporting and evaluating experimental design

The Procedures section of an experimental report should reflect a lot of considered planning. This is a fundamental foundation to your Project (or any piece of research). I have written this section as numbered bullet points that reflect the SQA's marking instructions for the procedures section of your final Project (although you should keep in mind that this is how any reporting and evaluation of experimental design should be performed).

A Procedures section, reporting and evaluating experimental design should have the following features:

  1. An overall methodology that allows the aim and hypothesis to be tested, otherwise the design is invalid.


  1. A method section that contains sufficient information to allow another investigator to repeat the work. This should be written in past tense and impersonal tone, as you will have practised throughout high school. An important feature here is that EVERYTHING is described, e.g. how were solutions prepared (a technician should not be preparing AH Biology level solutions unless a Risk Assessment demands this support), what apparatus were used to make measurements, etc.


  1. A rigorous evaluation of the validity and reliability of the experimental design. This involves identifying suitable controls (negative and positive where appropriate - as discussed in a previous section, this is not always appropriate so please refer back if you have forgotten). This also relies on a thorough examination of any confounding variables. The validity of the experiment may be compromised when any factor other than the independent variable could influence the value of the dependent variable. Consider how these confounding variables will be controlled or, if this is not possible, how will they be monitored throughout the experiment. Reflect back on randomised block design if particular confounding variables cannot be controlled.


  1. A clear demonstration of representating sampling - this includes the effect of selection bias and sample size. Selection bias is the selection of a sample in a non-random way, so that the sample is not representative of the whole population. Sample size may not be sufficient to decide, without bias, whether the change to the independent variable has caused an effect in the dependent variable.


  1. An independent replicate - as discussed previously, results can only be deemed reliable if they can be independently replicated. In this section of a report, a candidate must describe how the independent replicate was carried out.


  1. Justification of how the pilot study informed the final methodology - remember, prior to setting up an investigation, there will be lots of initial questions. Your teacher will not provide you with a protocol for your investigation. You might not know how much substrate to add in an enzyme reaction, or what concentration of caffeine to incubate a Daphnia safely in, or for how long to record monkey behaviour in the zoo. These initial questions are investigated in a Pilot Study and used to inform your final procedure.


  1. Finally, your Procedure section should include a methodology that reflects the Advanced Higher level you are working at. Go big or go home! Do your research - you might have done a bog-standard basic protocol in S4 and want to relive it now - but don't! Read around and see if you can adapt your methodology to improve accuracy of results.

The summary of SQA Procedure Checklist is below:

Go to SCHOLAR "3.2 Evaluating experimental design" for additional questions in this area.

Your teacher might now issue you with Learner Check 6 to check your learning of the Topic 3 content so far.

Task 16

Kyle confidently approached his desk in the Biology room on Monday. He knew exactly what he had to do - his project was focusing on the effect of tissue type on catalase activity. Although much more creative approaches to his investigation exist, Kyle opted for an old classic: he added hydrogen peroxide and detergent to a series of test tubes before starting the reaction by adding a different tissue (source of catalase) to each tube. Kyle left the reactions for a few minutes and then measured the height of foam produced (generated as oxygen is produced during the reaction).

Kyle wrote up his procedure. How many marks (out of 9) would you award his efforts?

5 test tubes were collected and 20vol hydrogen peroxide was added to each. 2 drops of washing up liquid (Fairy) was added to each test tube. The test tubes were placed at room temperature. A small block of liver was added to test tube 1; a small block of potato to test tube 2; a small block of carrot to test tube 3; a small block of apple to test tube 4 ; test tube 5 had no source of catalase as this was the negative control (without treatment). Test tube 5 would show how much foam would be produced in the absence of catalase activity - any measured value here would have to be subtracted from all other measurements.

Confounding variables included the concentration of hydrogen peroxide (kept constant by using a bottle given to me by the technician) and the amount of detergent (2 drops added with a pipette).

The experiment was repeated 3 times, which is appropriate for a simple in vitro experiment. My pilot study was previously carried out and, across the 3 repeats per tissue type, there was a low level of variation.


Suggested score and comments are available here.

(c) Data analysis

Having carried out a procedure, a set of results will be generated. These must now be analysed, perhaps involving the calculation and presentation of the mean, median or mode. Let have a quick look at the difference between these terms.

The Mean

The mean is the total of the numbers divided by how many numbers there are. You would tend to use the mean when your spread of data is considered "normal".

The Median

The median is the middle value. Order the numbers, lowest to highest, and see which one is in the middle of the list. You would tend to use the median value when the spread of data is "not normally distributed" - this might mean there are outliers that would skew your mean too much.

The Mode

The mode is the number that appears the most. The most is a good measure of central tendency when the majority of your data centers on a particular value that would be enormously misrepresented by a few extreme outliers. The mode is also very useful for categorical or discrete data.

Some basic keywords

According to Ruxton and Stafford (2015) in their SSERC Statistical Guide for AH biology projects, "a good rule of thumb would be to use the mean to describe the typical value". If the data seems asymmetrical (many values lying far from the mean at the upper or lower end), then the median may be more appropiate. A quick way to check if your data is approximately symmetrical (thus making the mean the best option to describe your data), is to calculate the mean and median values: if they are similar, your data is approximately symmetrical.


You can use R to do this. Consider the following data below, looking at catalase activity in the presence or absence of inhibitors - this is measured by the volume of oxygen produced:

The coding used in R to create summaries of this data set is shown below:

For the data I have called "noinhib", corresponding to No inhibitor present (top row in the table), the mean (33.33ml) and median (33ml) are nearly the same. The same is true for the lead nitrate and aspirin data. This would suggest that the spread of data is roughly symmetrical, with few outliers affecting the mean. Therefore, the mean, as a measure most people are familiar with, would be the recommended choice for describing the central tendency of the data.

You can plot this mean data to visually present your results (see opposite).


But it would be nice to show the spread of the data around the mean. This brings us to Standard Deviation.

Standard deviation and Range

While the mean and median are common ways to describe the central tendency of the data, the spread of the data can be described using the range or the interquartile range (as discussed in KA2). The interquartile range describes the spread around the median value.

However, the standard deviation describes the spread around the mean in a roughly symmetrical data set. The lower the standard deviation value, the smaller the spread of data from the mean.

We can quickly calculate the standard deviation using R, using the command "sd", as shown opposite.


This additional level of data analysis tells us that:

  1. In the absence of an inhibitor, catalase activity resulted in 33.3 ml +/- 1.5ml, i.e. 31.5ml to 34.5ml.

  2. In the presence of lead nitrate (positive control since this is a well documented inhibitor of catalase), the volume of oxygen produced was 2.7ml +/- 1.5ml, i.e. 1.2 - 4.2ml.

  3. In the presence of aspirin (our hypothesised inhibitor we are investigating), 29.3ml +/- 2.5ml oxygen was produced, i.e. 26.8 - 31.8 ml.

An important observation to make at this point is that the spread of data in the absence of an inhibitor (31.5ml to 34.5ml) overlaps with the spread of data in the presence of aspirin (26.8 - 31.8 ml): 31.8 lies between 31.15 & 34.5ml - as a result of this overlap, we say that the difference in the means calculated in the absence of inhibitor and in the presence of aspirin is statistically insignificant.

This overlap can be visualised when the data is displayed in graphical format with error bars, as shown below. I tend to use Microsoft excel for this as I prefer the visual output. I have included a YouTube video on how to do this.

Go to SCHOLAR to learn more in "3.3 Evaluating data analysis" and "Dog foods experiment".

Task 17

Holly carried out an investigation into the temperature on respiration in yeast, using the displacement of water protocol (shown below).

Holly repeated the in vitro investigation 3 times at each temperature and presented her raw data in the table below.

  1. Using R, calculate the mean and standard deviation for this small data set. Present your calculated values in an appropriate table.

  2. Using R or Excel, plot an appropriate graph to represent Holly's data.

Answers can be found here with an accompanying YouTube video to support with R coding.

Assessing Statistical Differences between Samples

Statistical tests are used to determine whether the differences between the means are likely or unlikely to have occurred by chance. A statistically significant result is one that is unlikely to be due to chance alone.

We have already looked at the use of error bars to indicate the variability of data around a mean.

If the treatment mean differs from the control mean sufficiently for their error bars not to overlap, this indicates that the difference may be significant.

We observed lack of significance when comparing the effect of aspirin and lead nitrate on catalase activity. As discussed above, the error bars overlapped when comparing the spread of data from the mean with the control and aspirin reactions - indicating that there was no statistically significant effect of aspirin on catalase activity.

By contrast, the graph you should have produced during Task 17 reveals error bars that do not overlap - this means that temperature may have a statistically significant effect on respiration in yeast, i.e. it is unlikely that the observed differences in mean values at different temperatures occurred by chance alone.

A statistical test for a difference between two samples

A t-test can be used to compare the differences between the means of two samples relative to the spread of their values, assuming the distribution approximates to "normal" (as discussed above). If the spread of their values between the two samples overlaps, then the difference between the two means is considered less likely to be significant (and more likely due to chance).

Lets look back at the data comparing the effect of aspirin and lead nitrate on catalase activity. The null hypothesis for this investigation would be that the tested inhibitors would have no effect on catalase activity - the t-test compares the means and the p-value generated relates to this null hypothesis.

If the p-value is >0.05, we accept the null hypothesis and say that there is >5% chance that the difference between the means occured by chance alone.

If the p-value is <0.05, we reject the null hypothesis and can conclude that we have statistically significant evidence that the inhibitor affects catalase activity.

Using R, we can perform a t-test using the code shown opposite.

The first three lines of code are telling R about our 3 samples: No Inhibitor, Lead Nitrate and Aspirin.

The 4th line of code is instructing R to perform a t-test to compare the No inhibitor data to the Lead Nitrate data. A p-value of 0.000016 has been calculated. Since this is <0.05, we can conclude that there is statistically significant evidence that lead nitrate affects catalase activity.


However, the 5th line of code instructs R to perform a t-test between the negative control ("noinhib") and the aspirin data. A p-value of 0.09 has been reported. Since this is >0.05, we must accept the null hypothesis that aspirin has no statistically significant effect on catalase activity.

Task 18

Corinna performed an investigation into the effect of tea-tree oil on the growth of bacteria. She incubated one culture of bacteria in an appropriate culture media and a second with culture media containing tea-tree oil. After 1 day, Corinna monitored the turbidity of the culture by measuring the transmission of light through the culture using a colorimeter. She repeat her investigation 5 times. Her data is shown below.

Using R, perform a t-test to test the null hypothesis, i.e. that tea-tree oil has no effect on bacterial growth.

Answer is available here.

A wealth of statistical support is available via the following SSERC booklet, written by Graeme Ruxton and Jim Stafford. It is downloadable via this link.

(d) Evaluating results and conclusions

Following the presentation of results, it is time to bring everything together and form a Conclusion. The conclusion reported must refer to the aim and hypothesis and be supported by the results documented throughout the report. Be cautious here to refer to the genuine dependent variable, e.g. when measuring catalase activity, it is probable that you may be actually measuring volume of oxygen produced rather than "catalase activity" - Your conclusion must refer to catalase activity and not to volume of oxygen produced.

In this particular example, the conclusion is NOT:

"Lead nitrate reduced the volume of oxygen produced, compared to the control, whereas aspirin does not affect the volume of oxygen produced".

This is more a "statement of trends".

The conclusion for this experiment should be more like:

"Lead nitrate inhibits catalase activity; however, aspirin does not inhibit catalase activity".

The validity of any conclusion drawn relies heavily on how rigorously the experimental section of a project was planned. A valid conclusion can only be drawn if:

  • Confounding variables have been satisfactorily controlled.

  • A suitable sample size was used for the investigation.

  • The whole investigation was independently replicated.

Validity and Reliability of Experimental Design

The validity and reliability of the experimental design should be taken into account. Consideration should be given as to whether the results can be attributed to correlation to causation. Have a look at Task 19 for some more practise.

Task 19

We have looked at correlation and causation before in Key Area 2 and in Task 6. But let's now take it a little further. Consider an investigation into the effect of age on reaction time. A student uses an online reaction time test, such as this one hyperlinked here. The student asks 4 people of each age tested to perform the test, under the same conditions. The data is shown in the table below.

  1. Using Google Sheets or Excel, construct a scatterplot to show the effect of age on reaction time.

2. What does your scatterplot show? Does your data show causation or correlation? If correlation, is it a positive or negative correlation? Can you make a comment on the strength of the correlation?


Suggested answers are available here, along with a YouTube tutorial on producing scatterplots using Google Sheets.

Calculating strength of a correlation

In a study such as the one above, we can also use R to gain a statistical insight into the correlation between two variables. We can use Pearson's product moment correlation coefficient to measure the strength of association between our two traits.

As outlined by Ruxton and Stafford (2015), "it is very easy to implement this in R through the cor.test function".

To do this in R involved inputting all the reaction times at each age (as shown opposite).

The second stage involved calculating the mean reaction times at each age using the summary command (a small section of the code is shown below).

Finally, the cor.test command was used to look at the association between age and mean reaction time. The following output was reported:

There are two useful outputs to note:

  1. The p-value - this value is associated with the null hypothesis, which in this case is that there is no association between age and reaction time. Since the p-value is <0.05, we can reject the null hypothesis and conclude that the two variables are related. But what is the strength of this relationship?

  2. The last value on the output report is important here - this is the Pearson's product moment correlation coefficient (or Pearson's r for short). Pearson's r can range from -1 to 1. A positive value (0-1) indicates a positive correlation while a negative value indicates a negative correlation. The closer the value is to 1 or -1, the closer the points on the scatterplot would be to a straight line of best fit - i.e. the stronger the correlation between the two variables. Our value of 0.93 suggests a strong, positive correlation between age and reaction time.

Go to SCHOLAR to try "3.4 Evaluating conclusions" for an interactive activity on positive and negative correlations.

Evaluation of Results

Evaluation of conclusions should also refer to existing knowledge and the results of other investigations. Meaningful scientific discussion would include consideration of findings in the context of existing knowledge and the results of other investigations. Scientific writing should reveal an awareness of the contribution of scientific research to increasing scientific knowledge, and to the social, economic and industrial life of the community.

Use the Padlet below to reflect on what you feel are the key points of Topic 3. Are there particular things you still have questions about?

Go to SCHOLAR to review "3.5 Learning Points" and test your understanding using "3.6 End of topic test".

Finally, go to SCHOLAR to test your full understanding of this Topic in "4. Investigative biology test".