Reliability

Reliability refers to whether an assessment instrument gives the same results each time it is used in the same setting with the same type of respondents using the same marking criteria. Reliability essentially means consistent or dependable results.

For example, an English essay needs to have a rubric/marking scheme and set of expectations that ensures that no matter the text studied, and no matter which teacher in the faculty marks the student work, a consistent reporting of student achievement would follow.

Threats to Reliability include the following:

  • measurement of irrelevant skills or understanding to the course (construct irrelevant variance)

  • assessment instruments that allow similar (or the same) evidence to be interpreted differently

      • rubrics that lessen the consistency of judgements

      • marking practices that lessen the consistency of judgements

  • inconsistent assessment conditions or inconsistent application of special provisions

How can we increase the reliability of assessment?

  • marking guides for test questions, especially for heavily weighted questions

  • share test marking so one person marks each section

  • swap papers so you aren’t always marking your own class

  • blind marking

  • benchmarking/calibrating

  • clear and consistent special provisions policies


Evaluating Reliability of an Assessment Instrument:

Are the assessment tasks reliable?

Do they have a clear marking scheme or rubric?

1. Outstanding Reliability

o Assessment tasks and conditions are strategically designed to remove all sources of non-relevant variation in measurements.

o It is a high priority that assessment conditions are clear, consistent, and enforced.

o In units with multiple classes and teachers, consistency around messaging and assistance is ensured through embedded practice.

o Marking schemes and rubrics are clear and unambiguous to ensure consistency in student and marker interpretation.

o Consistency of marking is ensured through a range of moderation processes such as single marker of task or sub-task, double marking or sample double marking, utilising sample scripts/responses for all grade levels, or comprehensive in-school moderation or marking calibration activities.

o Instructions/questions are clear and unambiguous to student interpretation.

2. High Reliability

o Assessment tasks and conditions are thoughtfully designed to remove sources of non-relevant variation in measurements.

o Assessment conditions are clear and do not advantage or disadvantage individual students.

o In units with multiple classes and teachers, consistency around messaging and assistance is considered.

o Marking schemes and rubrics are clear and aim to reduce marker variation.

o Consistency of marking is considered through processes such as single marker of task or sub-task, double marking or sample double marking, access to sample scripts/responses, or in-school moderation or marking calibration activities.

3. Satisfactory Reliability

o Assessment tasks and conditions are designed with some consideration of reducing sources of non-relevant variation in measurements.

o The assessment and assessment conditions are discussed in units with multiple classes and teachers.

o There is a marking scheme developed for the task and applied in marking.

o Different markers discuss marking of concern to ensure consistency and have access to an answer key or sample answers.

4. Minimal Reliability

o Assessment tasks are designed with minimal consideration of reducing sources of non-relevant variation in measurements.

o Assessment conditions could be interpreted differently by different students.

o There is minimal discussion about the assessment between teachers of the same unit.

o The marking scheme is underdeveloped and requires interpretation.

o Different markers barely discuss the marking, or the answer key or sample answers are underdeveloped or incomplete.

5. No Reliability

o Performance in the assessment tasks is largely determined by sources of non-relevant variation.

o Assessment conditions are not clearly stipulated to students and could be interpreted very differently by different students.

o There is minimal discussion about the assessment between teachers of the same unit. There is no clear marking scheme.

o There is no answer key or similar provided. Different markers do not discuss marking.


In order to receive credit for completing the Reliability portion of the workshop, please complete and submit THIS FORM.

Thank you.