Assessment in Modern Classrooms

Transcending the multiple choice test

Assessment in Modern Classrooms

In a paradigm-shifting work in the Education Researcher, Lorrie A. Shepherd (2000) called upon the education community across all disciplines to embrace change in the way it viewed assessment. She asserted that the content and format of tests much change in order to "better represent important thinking and problem solving skills." She went on to advocate a fundamental reinvention of how assessment is used by teachers and students. She cautioned against the positioning of interim assessments or progress monitoring as true formative assessment. Her work guides an ongoing shifts towards the use of challenging and engaging formative assessment in everyday instruction and, on a larger scale, a movement away from the conceptualization of education as a tool of social efficiency. Shepherd advocated for a new vision of curriculum in which social-constructivism guides accessible education for all students.

While we are eighteen years displaced from Shepherd's work, we have yet to fully embrace it. The rise of NCLB, high-stakes testing, and the subsequent focus on test results arguably set us back several years in realizing her vision for assessment in the classroom. (Proefriedt, 2008, p156-158) In our current climate, we use MAPS tests, Aimsweb, NSCAS practice tests, and other instruments to gauge where students are performing in relation to a particular standard. When these are not used as part of a feedback loop to improve student performance, educators are missing a valuable opportunity to make real gains in student progress. (Heritage, 2010)

Assessments can support student learning in three valuable ways. They can be a barometer of where student knowledge lies in relation to that of peers at the same age and development. They an be used as a resource for informing both pupils and teachers how knowledge acquisition is progressing and which skills or understanding that students lack. The third way, is as an engine that serves to build on the learning that has already occurred and drive understanding to a deeper level. (Parsi & Darling-Hammond, 2015) This latter type of assessment is where three-dimensional performance assessments are situated. Their purpose is to engage students in composing and constructing solutions to problems, explanations of phenomena, creating a product, or performing a scientific exercise.

Performance Assessment in Three Dimensional Learning

Assessment of inquiry and three-dimensional learning must stand apart from traditional multiple choice tests by virtue of the nature of the standards themselves. It is nearly impossible to evaluate a student's ability to construct explanations using such a limited tool. While a multiple choice format may be molded to accommodate some of the three-dimensional skills, it will not be able to measure all of them. Furtak (2017) discusses challenges inherent in assessing the performance indicators in the Next Generation Science Standards. She asserts that current assessments are not well-equipped to deal with the three-dimensional model of science learning. She gives relevant examples of the types of lessons that the NGSS call for, and the evidence statements that accompany the lessons. She discusses the idea that these “assessments” really are meant to be embedded into the instruction, rather than stand at the end as a declaration. In recognizing that these standards are so alien, she encourages teachers to consider their beliefs about assessment in the classroom and education in general. (Furtak, 2017)

Darling-Hammon, et al., (2013) delineated five indicators of high-quality, high-level assessment. The indicators provide valuable guidelines when designing assessments for both state programs and local feedback loops. The indicators are as follows:

  1. Assessments must measure higher order thinking skills at least two-thirds of the time.
  2. Critical abilities such as oral and written communication, experimentation, evaluation, technology usage, problem solving, and more must be evaluated.
  3. Skills must be measured against internationally benchmarked standards
  4. Assessment items should give valuable feedback that can guide instruction and be instructionally sensitive.
  5. Assessments should be evaluated for validity, reliability, and fairness.

The widespread adoption of the Next Generation Science Standards reflects an acceptance of the above guiding principles, however reluctant or unconscious the acceptance may be. The standards themselves require the first two indicators above, and hopefully states and other organizations will continue to work to bring the third, fourth, and fifth indicators to the forefront of large-scale assessment.

Assessment in three dimensions has also become a topic at the post-secondary level. One research group identified a protocol with which to evaluate the ability of assessment items to measure each of the three dimensions. (Laverty et al., 2016) This team notes that at the post-secondary level it is of the utmost importance for students to come away from science courses with the ability to apply specific knowledge sets to new situations. Their use, and similar efforts by other institutions, of NGSS-style standards as a guideline for curriculum and pedagogy is a commitment to that end.

Creating Assessments in Three Dimensions

Creating quality assessments in individual classrooms may come naturally, but for many teachers it requires a paradigm shift away from test items that they are accustomed to crafting. Pre-made assessment tools are not widely available and it may be difficult to incorporate into customized units. As having just begun the adoption process of NGSS-like standards, Nebraska science educators are widely engaged in both learning how to teach in three dimensions and figuring what assessment looks like and how to design it. One general strategy is summarized by the worksheet at the right. An assessment or an item should have components of the three dimensions: one or more disciplinary core ideas, crosscutting concepts, and science and engineering practices. (Bell, Van Horne, Penuel, Neill, & Shaw, 2016)

The Research + Practice Collaboratory has also constructed a thorough description of assessment item architectures that incorporate the science and engineering practices. These serve as outstanding models for how phenomena can serve as the foundation for performance assessment items that meet the standards for high-quality assessment. (Van Horne, Penuel, & Bell, 2016)


Claim, Evidence, Reasoning

Asking students to make claims based on evidence and provide reasoning to support their claims has become a common construct for assessment tasks in NGSS classrooms. It speaks to the heart of an investigation and encourages critical thinking. The assessment task at the right shows how a phenomenon is used with data to create an assessment task. (“Puppy growth” n.d.)

There are a variety of ways that rich science assessments can be deployed to students. One strategy that has gained recent popularity is Argument Driven Inquiry, in which students investigate a common question, and use evidence to support claims during a structured argumentation session with other peer groups. (Çetin & Eymur, 2017) This model allows students the opportunity to defend their thinking and closely examine why they come to their experimental conclusions. There are difficulties that arise during these argumentation sessions. Students don’t usually like to question or sufficiently push against the conclusions of their peers! Such questioning requires a high level of understanding of the content and I’ve observed in many students an unwillingness to take a social risk by critiquing or questioning the work of other students. When students are thus reticent to question each other, they lose a valuable opportunity to enhance their understanding, as well as that of the group they are engaging.

Purpose of Assessment