Literature Review

The Quality Assessment Guidelines are underpinned by contemporary research.

What is assessment?

The OECD defines assessment as “judgements on individual student progress and achievement of learning goals. It covers classroom-based assessments as well as large-scale, external assessments and examinations” (OECD, 2013).

This encompasses a very broad range of purposes and practices, so in order to clarify what is being discussed, normally assessment is split into two categories formative and summative assessment based on the primary aim of the assessment (for change or for grading).

Formative assessment

Formative assessment is a key component of the teaching and learning process and is generally used to describe the methods used by teachers to monitor student learning at any point in time. This is for the purpose of providing feedback to the student of how to improve their performance in the future, and to assist the teacher in making decisions and changes for teaching and learning going forward (Black & Wiliam, 1996).

Although the word assessment is used, it shouldn’t be considered synonymous with formal assessment as the vast majority of formative assessment conducted in classrooms lies outside of formal assessment regimes. Wiliam, one of the pioneers of much of the work related to formative assessment, has expressed regret using the word ‘assessment’ at all when describing this concept as it has led to some confusion, wishing he had called it ‘Responsive Teaching’ instead (2013).

Summative assessment

The purpose of summative assessment is to judge the extent of student learning for the purpose of grading, certification, or evaluation. The reason for doing this is to communicate to others about what a student knows and can do within, and often beyond, the classroom (Black & Wiliam, 1996).

An obvious key component for this to occur is that there is a shared meaning across different stakeholders such as students, teachers, parents, and employers. To have any real use, summative judgements need to have meaning outside of their immediate context and there needs to be confidence that a student receiving a particular summative judgement in one school will have received the same in a different school (Christodoulou, 2016).

Formal assessment

These guidelines have been designed for the development and evaluation of formal assessment items in the ACT’s senior secondary context.

In the ACT senior secondary system, formal assessment outcomes are communicated through grades and, in T courses, scores. Grades are defined by Achievement Standards, and when used with the Specific Unit Goals and Content Descriptions, describe the level of understanding the student demonstrated and what the student can do in the unit. Scores communicate attainment of a student in comparison with others in their scaling group. Ensuring validity of unit scores in scaling groups that consist of different courses, meshing, is largely beyond the scope of this project but is explored in other BSSS training.

Although formal assessment’s primary purpose is to ascertain summative judgements, the ACT’s continuous assessment model encourages formal assessment to be used formatively as well, by providing feedback to students on areas of improvement, and assisting classroom teachers in evaluation of their teaching programs.

The 2019 BSSS Review of Assessment and Moderation specified that formal assessment should be determined through three to five assessment items for a standard unit, or two to three assessment items for a half unit. This is to create consistency between the assessment and workload expectations of different subjects, ensure reliability, and balance the stress and anxiety concerns that can be raised by both too few and too many assessment items.

What are we assessing?

Assessment criteria vary across subject areas based on their context but in general are usually categorised as knowledge and understandings, and skills. The Achievement Standards contain the assessment criteria.

Models are continually being refined but current cognitive science suggests that knowledge and understanding are the result of the interaction of a limited working memory and seemingly endless long-term memory with Kirshner et al. (2006) describing learning as “a change in long-term memory”. This long-term memory is seen as the single dominant structure of human cognition and is called upon in everything we interact with. The brain builds connections between facts and ideas and develops interwoven schemata of concepts so that working memory can quickly call upon and apply that knowledge to different situations (Kirshner et al., 2006; Nuthall, 2007). These connections and schemata are strengthened through use, with expertise developing over time. Experts can make judgements very quickly as ‘snap judgements’ and apply them to different situations as a result of these well-developed connections in long-term memory (Gladwell, 2005). For assessment to truly reflect the knowledge and understanding of the student, it is important to encourage this consolidation of concepts into long-term memory.

The ability to do something, skills, are often categorised as being assessable through direct observation and evidence. Even here, teachers need to bear in mind the working memory to long-term memory relationship, so as to not confuse performance while learning in one context as the generalised ability to apply the skill to different contexts (Wiliam, 2014).

Human knowledge is often divided into what are called domains, or in the school context, subjects. Wiliam (2014) makes the point that “the ‘traditional’ school subjects are not arbitrary divisions but are rather distinct ways of thinking about the world”. In addition, both Christodoulou (2016) and Willingham (2019) argue that skills such as problem solving, and critical thinking do not exist in isolation at all but are dependent on large bodies of domain-specific knowledge held by the student. Understanding can be defined as the processing of all interactive elements simultaneously (Chen et al., 2017) or the ability to transfer knowledge to new and different situations (Wiggins & McTighe, 2011). The guidelines are flexible to accommodate interpretation in all learning areas.

Criterion 1: Coverage of BSSS Accredited Courses

Wiliam (2014) outlines two threats to validity: assessment which is ‘too small’ (construct under-representation) and fails to assess what it should and assessment which is ‘too big’ (construct irrelevant variance) and assesses things which it should not. An example of both issues may be a video presentation assignment in a History class on a specific small historical aspect. Some teachers may look at the assignment and argue that the assignment is ‘too small’ only assessing a small part of the unit and others may argue that it is ‘too big’ assessing things it should not such as their video editing and presentation skills. This is not to say that this assessment should not take place. This assignment could provide a fantastic opportunity for students, but the teacher should try and address these concerns across the entirety of the unit assessment.

The domain of a subject’s knowledge, skills and understandings is often impossibly large to assess in entirety. Even at the unit level there can often be goals or descriptions that could be interpreted and assessed in infinite ways. Due to this, assessment is almost always a construct under-representation but is then used to make inferences as to the students’ performance in the construct as a whole. For these inference to be valid, teachers should ensure that appropriate breadth and depth is assessed (Christodolou, 2016).

A New Zealand meta-analysis review of the effects of curricula and assessment on pedagogical approaches (2005) shows that high stakes assessment can limit the classroom curriculum for students, particularly lower achievers and minority students. It is easy for teachers to fall into the trap of assessing what is easy to assess and ignoring the assessment of more difficult to assess skills or content. Wiliam (2014) uses an example of the assessing of practicals in science. It had been shown previously that the skills in science practicals were highly correlated to the scores in science tests. However, when practical assessment was removed from the formal assessment program this correlation does not hold. It is important that assessment type and scope should not be allowed to distort curriculum delivery. (Carr, McGee, Jones, McKinley, Bell, Barr & Simpson, 2005).

Criterion 2: Reliability

To make valid inferences of student knowledge, skills and understandings in the domain, assessment measurements need to minimise the influence of non-relevant factors in the measurement. This is called reliability.

To understand what reliability means we need to understand that all assessment measurements (observed scores) have an error contained within them such that:

Observed Score = True Score + Error

The True Score in the above equation is not that we think a student’s ability is predetermined or fixed but represents what that student would get on average if the task was given repeatedly completed with appropriate ‘mind wiping’ or was given a multiple parallel assessment of the exact same difficulty on the same material (Bramley & Dhawan, 2010). Note that it is not possible to completely remove this error, while improving reliability of assessment means to aim to minimise this error to improve the stability of results there will always be variation. (Dirksen, 2013). Increased reliability increases our certainty that a student who receives an 80 in an assessment has a higher achievement than a student who receives a 70 for example.

Reliability can be thought of in terms of consistency:

· across time (would students receive the same result from the task if conditions were different?)

· across tasks (would students receive the same result from different tasks assessing this material?)

· and across markers (would students receive the same result from different markers?) (Christodolou, 2016; Darr, 2005b).

Within an assessment item such as a test, reliability can also be thought of as the consistency of a question compared to all the other questions in the task assessing the same material (Dirkson, 2013).

Reliability can really only be determined through the examination of results in the assessment but the factors that decrease error are well known. These include: standardising assessment conditions; designing suitable questions in terms of difficulty for the students involved; having questions that lead to a spread of scores; and having quality rubrics and marking schemes leading to consistent marking and moderation (Darr, 2005b; Masters, 2013).

Criterion 3: Bias Awareness

Bias in assessment is one which favours a student or students over others based on factors other than the key knowledge, skills and understandings of the student in the unit. Bias plays a role in how inferences are drawn, and so to make assessment more principled teachers need to recognise, “that our characterisations of students are inferences and that, by their very nature, inferences are uncertain and also subject to unintentional biases.” Bennett (2011, p.18). Bias can be evident in the construction of assessment tasks which means that teachers need to design assessment with, for example, gender, socio-economic and cultural considerations in mind in order to be able to make valid inferences from the data.

The most common way bias is caused by classroom teachers in assessment is through assumptions of background knowledge or the privileging of certain types of background knowledge (OECD, 2013). An individual assessment task may require a level of background knowledge to fully engage with, teachers should be aware of this and allow easy access to this information to lessen the impact of advantage or disadvantage and to not compound this advantage or disadvantage in other assessment items. The Illinois Guiding Principles of Assessment (2015) highlights the importance of classroom assessment practices being responsive to and respectful of the cultural and linguistic diversity of students and mentions unnecessary linguistic complexity as an example of bias. The NSW Centre for Education Statistics & Evaluation (2015) refers to assessment that does not “tacitly or explicitly privilege students from high socio-economic backgrounds” (p.6).

Under the Disability Standards for Education (2005) teachers are required to make reasonable adjustments to assessment for students with a disability. Reasonable adjustments are ones that maintain the assessment of a student against the Achievement Standards, unit goals and unit content descriptions of the unit while mitigating the effect of a disability on the assessment. Identifying the key knowledge, skills and understandings is an essential component to ensure that the validity of the assessment is maintained.

Formal assessment in senior secondary should assess the student’s objective performance and not incorporate judgements of character, effort, behaviour or potential (Hanover Research, 2011). This can be difficult for some teachers. Teachers can, however, take steps to ensure these unconscious biases do not cloud their objective judgement such as transparent and explicit marking schemes and marking processes, deidentified student assessment, or having teachers not teaching the unit as markers of assessment (Stevens, Ructtinger, Liyanage & Crawford, 2017; Masters & Forster, 1996).

Calculating the bias in assessment can really only be determined through the analysis of assessment results.

Criterion 4: Levels of Thinking

There are a number of proposed theories for how students learn and how their thinking in concepts progresses. The most widely known general theoretical frameworks are Bloom’s Taxonomy (1956), Anderson and Krathwohl’s Taxonomy (Bloom’s revised taxonomy) (2001) or SOLO Taxonomy (Biggs & Collis, 1982). These generally aim to describe phases of understanding and application, and the interconnectedness with other concepts or ideas.

Individual concepts from a domain can be mapped out to describe the sequence of how ideas and practices develop. These are generally called ‘learning progressions’ (Furtak, Morrison, and Kroog, 2014). The best developed learning progressions aim to be ‘top-down’ involving the views of content experts and ‘bottom-up’ by seeking to understand how student learning intersects with the content (Stevens, Ructtinger, Liyanage & Crawford, 2017). Ideally, they are linear and impossible for students to achieve higher elements without satisfying earlier elements. For this reason, learning progressions work best when focused on an appropriately small concept and are locally adapted to the students (Wiliam, 2014).

Providing assessment that assesses a range of thinking levels students access to the assessment task as well as the opportunity to develop and extend their thinking. Teachers are faced with increasing diversity in classrooms (Moon, 2005) and therefore using assessment tasks that have a range of thinking levels, from low to high, will allow for a spread of results. In addition, having a range of assessment tasks will allow students to demonstrate different thinking levels, skills and abilities, and different assessment tools such as group work, oral tests or debates can help to improve their learning (Murillo & Hidalgo, 2017).

All assessment tasks in the ACT are based on the Achievement Standards which cater for the needs of diverse learners. Rubrics which are developed for each task are specific and should use the verbs from the theoretical framework to define levels of achievement (Griffin, 2018).

Criterion 5: Student Engagement

Students who are unmotivated to complete an assessment will not produce reliable or valid assessment results (Nuthall, 2007). Which means student engagement is an important aspect of a quality assessment.

Transparent and clear assessment instructions which describes what success looks like allows students to participate fairly in the assessment process and increases reliability (Wiliam, 2014). Students need to feel equipped to complete the task with the knowledge, understanding and skills gained from the classroom.

In addition, designing assessments that are embedded in contemporary issues and relevant to the students also improves engagement. Authentic tasks promote realistic problem-solving (Masters, 2014, Bae & Kokka, 2016) and allow students to think as an expert would in a discipline area. Bae and Kokka also outline how student autonomy can improve engagement, giving students decision-making opportunities in regard to their assessment. Collaborative opportunities are also often popular with students.

A student’s engagement with assessment is not just affected by these factors. Indeed family, peer and internal pressures can have a greater impact on a student’s motivation than the formal assessment requirements (Nuthall, 2007). Schools, leaders and classroom teachers need to promote positive student wellbeing, ensuring that students feel supported with their needs.

Criterion 6: Academic Integrity

Academic integrity is the assurance that student work is the genuine product of the student being assessed. Academic integrity is of the utmost importance for ensuring that results allow valid inferences to be made about student achievement.

Assessment tasks that utilise ‘test conditions’ that prevent communication between students is a common approach for appropriate tasks. The test conditions should be clearly communicated to students to remove the possibility of ambiguity or confusion. Maintaining test security and ensuring tasks are not reused will further assist in academic integrity.

Teachers can build academic integrity into their assessments through: designing a wide range of assessment types; changing tasks regularly; using a recent or local context rather than a general context; incorporating classroom experiences that outside agents would not be privy to; including personal reflection/opinion; using interdependent tasks and drafting or evidence of planning, check points and clear tracking (Charles Sturt University, 2020, University of Waterloo, n.d., University of Tasmania, 2018).

References

Anderson, L. W. and Krathwohl, D. R., et al (Eds.) (2001) A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. Allyn & Bacon.

Bae, S., & Kokka, K. (2016). Student Engagement in Assessments: What Students and Teachers Find Engaging. Stanford, CA. Stanford Centre for Opportunity and Policy in Education and Stanford Centre for Assessment, Learning and Equity. Retrieved from: https://www.newcastle.edu.au/__data/assets/pdf_file/0004/318550/student-engagement-assessments.pdf

Bennett, R. (2011). Formative assessment: a critical review. Assessment in Education: Principles, Policy & Practice, 18: 1, 5 – 25

Biggs, J. B., & Collis, K.F. (1982). Evaluating the quality of learning. The SOLO taxonomy (Structure of the Observed Learning Outcome). New York: Academic Press.

Black, P. & Wiliam, D. (1996). Meanings and consequences: a basis for distinguishing formative and summative functions of assessment? British Educational Research Journal, 22(5).

Bloom, B. S. (1956). Taxonomies of educational objectives. Handbook 1. Cognitive Domain. NY: McKay.

Bramley, T. & Dhawan, V. (2010). Estimates of Reliability of Qualifications. Cambridge Assessment.

Carr, M., McGee, C., Jones, A., McKinley, E., Bell, B., Barr, H., and Simpson, T. (2005). Strategic Research Initiative Literature Review; The Effects of Curricula and Assessment on Pedagogical Approaches and on Educational Outcomes; Report to the Ministry of Education. Retrieved from: https://www.educationcounts.govt.nz/__data/assets/pdf_file/0003/9273/The-Effects-of-Curricula-and-Assessment.pdf

Centre for Education Statistics and Evaluation (CESE). (2017). Cognitive load theory: Research that teachers really need to understand, NSW Department of Education

Charles Sturt University. (2020). Designing For Academic Integrity. Retrieved from: https://www.csu.edu.au/division/learning-and-teaching/home/assessment-and-moderation/assessment-resources-and-information/designing-for-academic-integrity

Chen, O., Kalyuga, S. & Sweller, J. (2017). The Expertise Reversal Effect is a Variant of the More General Element Interactivity Effect. Educ Psychol Rev, 29. 393–405.

Christodoulou, D. (2016). Making good Progress? The Future of Assessment for Learning. Oxford University Press

Cluskey Jr., G.R., Ehlen, C., & Raiborn, M. (2011). Thwarting online exam cheating without proctor supervision. Journal of Academic and Business Ethics, 4. Retrieved from: http://www.aabri.com/manuscripts/11775.pdf

Darling-Hammond, L., Herman, J., Pellegrino, J., et al. (2013). Criteria for high-quality assessment. Stanford, CA: Stanford Center for Opportunity Policy in Education.

Darr, C. (2005a). A hitchhiker's guide to validity, Set: Research Information for Teachers, 2. 55-56. NZCER.

Darr, C. (2005b). A hitchhiker's guide to reliability. Set: Research Information for Teachers, 2. 59-60. NZCER.

Dirksen, D. (2013). Formative assessment: Reliability & validity. In D. Dirksen, Student assessment: Fast, frequent and formative (Ch. 3, pp.17-28). Lanham, MD: Rowman & Littlefield Education.

Evidence for Policy and Practice Information and Coordinating Centre (EPPI-Centre). (2004). A systematic review of the evidence of reliability and validity of assessment by teachers used for summative purposes. Retrieved from EPPI-Centre website: http://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%20summaries/ass_rv3.pdf?ver=2006-03-02-124720-170

Furtak, E. M., Morrison, D. E. B., and Kroog, H. (2014). Investigating the Link Between Learning Progressions and Classroom Assessment. Science Education, 98(4)

Gladwell, M. (2005). Blink: The power of thinking without thinking. New York: Little, Brown and Co.

Griffin, P. (editor.) (2018). Assessment for teaching (Second edition). Cambridge University Press, United Kingdom [London]; New York, USA ; Port Melbourne, VIC.

Hanover Research (2011). Effective Grading Practices in the Middle School and High School Environments.

Illinois Education Department (2015). Guiding Principles For Classroom Assessment. Illinois: Illinois State Board of Education.

Kirschner, P., Sweller, J. & Clark, R. (2006). ‘Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential and inquiry-based teaching’, Educational Psychologist, vol. 41, no. 2, pp. 75-86

Masters, G., & Forster, M. (1996). Common ‘errors’. In E. Recht (Ed.), Developmental assessment: Assessment resource kit (pp. 36-39). Camberwell, Australia: Australian Council for Educational Research.

Masters, G. (2013). Reforming Educational Assessment, Imperatives, Principles and Challenges. Australian Education Review No 57. ACER.

Masters, G. N. (2014). Assessment: Getting to the essence. Retrieved from Australian Council for Educational Research website: https://www.acer.edu.au/cari/articles/assessment-getting-to-the-essence

Moon, T. R. (2005). The role of assessment in differentiation. Theory into Practice, 44(3), 226-233.

Murillo, F.J., & Hidalgo, N. (2017). Students’ Conceptions about a Fair Assessment of their Learning. Studies in Educational Evaluation, 53.

NSW Centre for Education Statistics & Evaluation (2015). Re-assessing assessment. Retrieved from: https://www.cese.nsw.gov.au//images/stories/PDF/Re-assessing_Assessment_v6.pdf

Nuthall, G. (2007). The Hidden Lives of Learners. NZCER Press: Wellington.

OECD. (2013). Synergies for Better Learning: An International Perspective on Evaluation and Assessment: OECD.

Popham, W. J. (2018). Classroom Assessment – What teachers need to know (8th Edition). Allyn & Bacon.

Stevens, R., Ructinger, L., Liyanage, S., and Crawford, C. (2017). Review of Contemporary Research in Assessment and Reporting. NSW Department of Education

University of Tasmania (2018). Minimising Plagiarism and Cheating. Retrieved from: https://www.teaching-learning.utas.edu.au/assessment/choosing-and-designing-assessment-tasks/minimising-plagiarism-and-cheating

University of Waterloo (n.d.). Encouraging Academic Integrity Online. Retrieved from:

https://uwaterloo.ca/centre-for-teaching-excellence/teaching-resources/teaching-tips/planning-courses/course-design/encouraging-academic-integrity-online

Wiggins, G. P., & McTighe, J. (2011). The understanding by design guide to creating high-quality units. Alexandria, Va: ASCD.

Wiliam, D. (2013) Example of a really big mistake: Calling formative assessment formative assessment and not something like “responsive teaching”. Twitter available at: https://twitter.com/dylanwiliam/status/393045049337847808 (accessed 6 July 2020)

Wiliam, D. (2014). Principled Assessment Design. SSAT: Redesigning Schooling – 8. Retrieved from: http://www.tauntonteachingalliance.co.uk/wp-content/uploads/2016/09/Dylan-Wiliam-Principled-assessment-design.pdf

Willingham, D. (2009). Why Don’t Students Like School. Jossey-Bass, John Wiley & Sons.

Willingham, D. (2019). How to Teach Critical Thinking. Education: Future Frontiers. Occasional Paper Series