Up-to-date thoughts from our newsletter. Please share!

Blog

(Affiliate link)

Startup Guide: How to Change School Assessment Practices & Why

by Joshua A. Taton, Ph.D. | September 12, 2023 | 8 min read

Assessing students' understanding in mathematics, in order to promote their learning and engagement, isn't easy. One key—along with a number of related practical strategies—involves looking at, and thinking critically, about Standard for Mathematical Practice 3 (SMP3) in the Common Core State Standards.

TL;DR (Abstract)

If we accept the premise that Standard for Mathematical Practice 3 (SMP3) is the heart of modern, sense-making, and student-centered mathematics instruction, then we need to use new and different methods for evaluating students' work (and for understanding their thinking). Because there are, necessarily, a variety of ways of justifying or explaining mathematical reasoning—even with seemingly simple mathematical ideas—then we need an evaluation approach that incorporates both proficiency toward meeting expectations of the standards and an awareness of the specific strategies utilized by students. Such strategies aren't one-size-fits-all, either. Good instruction helps move students from concrete strategies toward more sophisticated and abstract ones. Finally, a variety of strategies and ways of promoting new mindsets can be employed—incrementally and over time—to achieve system-wide or building-wide change. Doing so allows for a richer appreciation of students and their work and thinking.

Introduction

Standard for Mathematical Practice 3 (SMP3) in the Common Core State Standards expects that students should have opportunities, consistently, to "Construct viable arguments and critique the reasoning of others."

SMP3, aside from being my favorite Standard for Mathematical Practice—and, I think, the sentiment that lies at the heart of the Common Core State Standards, themselves—also seems to represent the key to unlocking assessment and instruction for the modern era: practices that promote rigorous thinking and engagement.

Phew. That was a lot. But I mean it.

Let's break this down further.

What does SMP3 imply about mathematics?

SMP3 implies, when we consider it deeply, that mathematics instruction should throw out the traditional notion of "right" and "wrong." Not only do I believe this, personally and deeply in my soul, but avoiding this flawed perspective on mathematics learning is also supported by empirical research.

Why?

Well, for one, "right" and "wrong" are absolutes that—when you look deeper—actually rest on a house of cards. They assume that mathematics is a science, in which results are infallible and untainted by human culture, context, reasoning.

Nothing could be further from the truth! There's no such thing as a definite "order of operations," for instance. There is ambiguity and disagreement, even among professional mathematicians, in how to communicate and perform some forms of multi-step arithmetic*.

I also love TEDx Talk by mathematician and educator Dan Finkel. In it, he demonstrates ideas in mathematics that, many people falsely believe, have only one so-called "correct" answer.

For example, related to Finke's talk, consider the expression 11+2. This expression can be evaluated in a way that results in multiple possible outcomes. Yes, the reasoning for demonstrating this fact is accessible, fully, to young children (say, even as early as Grade 3 or 4, and perhaps even earlier).

In fact, the different ways to simplify 11+2 depend on different underlying, and valid, assumptions about the system in which the operation is performed. This is not some strange and abstract mathematical claim that only experts would understand! Following Finkel's reasoning, let me make it concrete.

From a young age, we are all familiar with multiple ways of looking at 11+2, such as:

Method 1: 11+2=13 (using typical arithmetic and, say, counting on a number line);
Method 2: 11am + 2hrs = 1pm (so, counting on a circle—instead of on a number line—it is rather simple to show that 11+2=1).

This isn't magic. It's not some sort of "slick" argument. It's mathematically valid and real. Mathematicians even have a name for this type of arithmetic; Method 2 is called modular arithmetic. Without Method 2, or modular arithmetic, your computer wouldn't function and buying goods in stores (via UPC codes) would be nearly impossible.

So, in sum, SMP3 shows us a truth about mathematics: that there are, necessarily, multiple ways of looking at, solving, and communicating about problems. Even problems that, on the surface, seem as if they should be quite simple.

What does SMP3 imply about instruction?

Well, if mathematics itself doesn't—and shouldn't—have only "correct" and "incorrect" (or "right" and "wrong") answers, then SMP3 shows us how to engage with the actual nature of mathematics in the classroom. Specifically, students should have consistent opportunities to reason, justify, and explain their thinking.

That's because they might offer unexpected ideas that make sense to them. And these ideas, in turn, might help deepen their understanding and that of their classmates.

A classic case of this unexpectedly-productive opportunity is found the in research by Deborah Ball. In this transcript and this paper, Ball describes a instance during classroom instruction when a student, Sean, offers the idea that "some numbers are both even and odd."

Rather than immediately dismiss the idea, as "incorrect," in this case, the skilled instructor simply responded, "Tell me what you mean." Sean then explained that some numbers, like 6, have an odd number of even factors (i.e., 3x2=6).

This episode turned into a highly productive, multi-day classroom exploration of numbers that have an odd number of even factors, numbers that became known as Sean Numbers. And it allowed the students to have a deep, conceptual appreciation for composite numbers and prime factorization.

I have no doubt that Sean and his classmates understand prime and composite numbers, and factoring composite numbers, at a much deeper level than students who were simply asked to memorize and apply the standard definitions.

And, notably, other students, asked to memorize and apply the standard definitions, have probably faced a slew of multiple-choice, fill-in-the-blank, and similar sorts of homework and assessment questions. Most likely, most commonly, these are simply marked with the old red "x" or "c."

What does SMP3 imply about assessment?

If mathematics—by its very nature—isn't simply a discipline, as many falsely believe, that involves only "right" and "wrong," and if SMP3 means that students must be able to explain their reasoning, then assessment seems to get much more complicated.

Indeed, I do believe that most reliable, rich assessments, that focus on understanding students' thinking, should have vastly more open-ended (or "open-middle") questions than do traditional assessments. By open-ended (or "open-middle") questions, for simplicity' sake, I mean questions that allow students to justify their responses.

I don't want to spend much time on assessment design, however. That is a complex topic unto itself.

But I do want to explore assessment practices—or grading, evaluating, making sense of student work. Let's follow a clear chain of reasoning:

If mathematics is much more than "right" and "wrong," then grading student work must consist of something entirely different than marking red "x's" and "c's."
Further, as evidenced by the case of Sean, students' thinking may be complex, and unexpected.
Therefore, our grading strategy must evolve to something more qualitative, and it must be robust enough to incorporate this complexity.

Sure, this seems like a difficult challenge, but I've got well over a decade of experience in supporting teachers, practically, in doing something different that addresses these challenges. And I know, first-hand and from rigorous research (e.g.), that these strategies work.

More modern, robust strategies of assessment

When working with teachers and school leaders, to promote newfound engagement with the idea that student thinking is inadequately captured by the labels "correct" and "incorrect," I begin by encouraging them to think about three categories:

"Correct" (or, I prefer, "valid");
"Incorrect" (or, I prefer, "unexpected," but not necessarily "wrong"); and
"Sorta, kinda correct" (i.e., the "middle ground," that means students provided something that was expected and something that may not have been).

This is a different mentality, I contend, than simply thinking about or granting "partial credit." In this framing, I encourage teachers (and school leaders) to shift from "taking 'points' away" toward appreciating the strengths and assets and forms of understanding that the student is trying to convey.

Quickly sorting students' work. When looking at samples of student work, such as formative assessments (or "exit tickets"), I encourage teachers to sort, quickly, their students' work into three piles corresponding to these categories. That way, they can check the pulse of the classroom, as a unit, which—I contend—helps them understand better the set of possible response they can make as instructors, to deepen students' understanding.

With practice, teachers who (and school leaders who coach teachers into) sorting student work in this way can evolve in their practice toward even more complex approaches—approaches that, I want to stress—are not necessarily more difficult or time-consuming. Note that I wouldn't push further than sorting into three piles until this practice becomes nearly automatic for the teacher(s) and they are using asset-based lenses to plan responsive instruction from this data.

Shifting to a rubric approach. After becoming proficient in sorting work into three categories, I subsequently ask teachers to consider sorting work into as many as five categories. This new classification scheme can morph and map onto a traditional rubric approach to grading, such as—

Level 5: Student demonstrates valid or expected reasoning and provides a so-called correct answer;
Level 4: Student demonstrates valid or expected reasoning, but provides and answer that would be regarded as mostly (but not fully) correct; OR student provides a so-called correct answer, but offers reasoning that contains minor logical inconsistencies or that lacks full clarity;
Level 3: Student provides an answer that would be regarded as mostly (but not fully) correct AND demonstrates reasoning that contains minor logical inconsistencies or that lacks full clarity;
Level 2: Student provides an answer that would be commonly regarded as significantly far away from the so-called correct answer, but provides mostly-valid reasoning; OR student provides an answer that is not-too far away from the so-called correct answer, but provides mostly-invalid (or very little reasoning);
Level 1: Student provides no reasoning, but provides an answer (whether considered correct or not); and
Level 0: Student doesn't attempt the problem and leaves the paper blank.

This rubric, I believe, fully captures all of the important possibilities in segregating correctness from justification. It provides a richer analysis of students' work, because it accommodates what SMP3 implies: opportunities for students to justify their reasoning, and for teachers to critique not just the results but also their reasoning. (In truth, SMP3 implies that other students should be responding to each others' thinking, but for simplification, I'm focusing here on teacher-assessment rather than student-student assessment or pedagogical practices that allow for student critiques.)

There is a ton of research on how this sort of rubric, particularly in the context of formative assessment (a definition and topic for another day) is impactful for student learning. This also accommodates a standards-based (what some call a "mastery-based") grading approach, rather than the more nebulous and inconsistent "letter-grade" or "percentage-grade" approach. I direct you to one summary of the research here.

But there are still problems. After supporting teachers (and school leaders) with shifting to this sort of a rubric-style approach, I tend to push them toward a new set of questions and conclusions:

How does this sort of a rubric help teachers, particularly when looking at classroom-level results, make decisions about how to support student learning?
How does this sort of a rubric provide information on what students know, need to know, and what are the next-best instructional steps to deepen their understanding?

The answer to both questions, I'll state here, simply, is: They don't.

Therefore, we need an approach that includes some refinement.

What's missing, I argue, is a classification of the nature of the strategies—the types of justifications or reasons—that students provide along with their work and answers. Why?

Because some types of justifications, quite simply, are more mathematically abstract and sophisticated than others. And we want to push students toward those that are more abstract and sophisticated, if not within a given unit, over the course of a year and across multiple years. This notion is, of course, closely connected to the idea of curricular coherence. (Another topic and definition that I will reserve for a later day.)

Pushing toward abstraction. Here's a case in point: Young students can—and should be encouraged to—use their fingers to justify simple addition and subtraction problems. For instance, 4+5=9 because "4...then 5, 6, 7, 8, 9." (Picture this by counting on your fingers.) Students may need and depend on this type of reasoning for quite some time.

As students move forward in, say, Grade 1, we want to encourage them to think about this problem in other ways. "What are some other ways we could solve this problem? For instance, are there any closely-related numbers we could use to think about it?" a skilled teacher might say.

And a student might respond (with support):

"Well, I know 5+5=10." (OK! How does that relate to 4+5?)
"Four is one fewer than five." (Where? Oh, when you look at one of the addends, i.e., one of the numbers in the sum?)
"So that would mean that 4+5 should be one fewer than 5+5." (OK! And what's one fewer?)
"If 5+5=10, then 4+5 must be nine, because 10-1=9." (OK! Can we verify that?)
"4...5, 6, 7, 8, 9 (counting upward) and 10, 9 (counting downward) (Makes sense! Great job using your reasoning and working hard on this problem!)

This is a classic example of how a teacher could gently nudge students toward a deeper and more flexible way of thinking, one that goes beyond counting up with fingers to justify addition.

Strategies plus proficiency. If the five-part rubric above could be considered a proficiency-grading approach, then I hope my line of reasoning demonstrates why it needs to be supplemented with a classification of students' strategies. In the case of 5+4, here, I would say that it's important for Grade 1 teachers to know, say:

Which students, and how many, rely on counting on their fingers (or an object/picture);
Which students, and how many, rely on "counting up" from the first addend (whether using fingers or sub-vocalizing);
Which students, and how many, rely on "counting up" from the largest addend (i.e., 5...then 6, 7, 8, 9), which shows an understanding of commutativity and efficiency (again, whether using fingers or sub-vocalizing);
Which students, and how many, can reason using 10-1=9 (whether using fingers or sub-vocalizing);
Which students, and how many, can recognize or visualize 4+5=9, perhaps by what's known as conceptual subitizing—or simply automatic recognition—without needing to sub-vocalize.

These represent the main strategies that one might use in simplifying or evaluating the expression 4+5.

It's amazing, to me, that there are so many! And note that these strategies progress, top to bottom of the bulleted list, from concrete to more abstract.

Having a deep, and dare I say documented awareness of which students, and how many, are using these various strategies allows teachers to encourage students to share their thinking in small groups, during whole-class discussions, and so on. That way, the classroom of students can make progress toward deeper understanding, but importantly, while starting at the cognitive level that is initially comfortable for them.

In short, looking at student work—I argue—should incorporate both a measure of proficiency and a measure of the types of strategies students' are using. In other words, students' work should be evaluated with not one percentage-score or letter-grade, but two, generally-qualitative codes. (And you can think of a 2x2 table, in which each student will have an evaluation that ranges in proficiency (lower to higher) and sophistication/abstraction (lower to higher) on their work.

But how? Addressing the practical challenges and the pushback

Undeniably, what I've outlined above is complex. And it's understandable that those unfamiliar with it or hesitant about new and different approaches will find it cumbersome.

That's why I encourage using a staged approach for change—or an adult-focused trajectory of learning—that may take many years to develop in a given school community. School leaders (or district leaders) simply cannot expect change to happen overnight, as this multi-layered approach takes time to understand, appreciate, practice with, make mistakes with, and incrementally incorporate.

Also, school leaders (or district leaders) should not try encouraging an approach such as this in a fully top-down fashion and then abandon it when—inevitably—obstacles arise. Using such an approach and making organizational change requires vision, persistence, and optimism.

So how can practical challenges and pushback be addressed? Here are a few ideas, that—again, I've seen it with my own eyes—absolutely can work.

This approach seems nice, but isn't it too time-consuming? We know that teachers' time is valuable. And that more and more is being expected of them in lesser and lesser amounts of time. So, doesn't this approach take much more time than a traditional approach?

Quite simply: no.

I would argue that a traditional approach to grading student work—going question by question and marking each response "right" or "wrong" and providing corrective feedback—is already very time-consuming.

One point that I've failed to mention, thus far, is that I would recommend using the modern approach, outlined here, to look at students' work holistically. In other words, rather than worrying about grading each problem on a homework assignment or assessment, individually, use the rubric and strategy classification across the entire sample of student work. (This assumes that the questions on the given assignment or assessment are in the same general topic area or domain. If not, then an evaluation of students' thinking would, necessarily, pertain to only those thematically-aligned questions.)

Second, if you're using the modern approach on, say, a formative assessment, you aren't looking to evaluate students' work for accountability or grading purposes. You're looking to get a relatively quick snapshot of what the classroom, and groups of students, know and can do—so that you can make quick, reasonable adjustments to the next day's lesson. For the purpose of building deeper understanding within the classroom, again, as a whole.

There's no need to provide corrective feedback to every student's sample of work, if we are looking at formative assessments. Quite frankly, research shows that marking an individual student's work as "correct" or "incorrect," then showing them the so-called "correct" response and providing a grade, is a less effective approach. (See here, as well.) Students don't learn much from your work, trying to correct the answers for them.

Aside: Here's a practical option, offered by a teacher, on how to shift toward whole-class grading and away from individual grading, of course, only for certain types of assignments.

School leaders can help ease the transition to a rubric-plus-strategy approach by offering clear and simple templates or recording sheets, and developing systems, that ease the transition. Don't go "all or nothing," but encourage incremental change over weeks and months—perhaps over several years.

A common concern from school leaders. When demonstrating or describing this approach to school (or district) leaders, I am often met with a peculiar type of skepticism that goes something like this: "My teachers don't have enough content knowledge to understand the different strategies that students could use, and the relative level of abstraction of these strategies."

My response often goes something like this—with increasing levels of terseness, depending on the circumstances, my relationship with school leaders, and the time I've spent building relationships with them:

Whether true or not, how do you expect them to learn, if you don't provide them opportunities to develop this type of awareness?
How else are you addressing the need to raise the bar with content understanding? (In other words, this seems like a larger and different issue, aside from evaluating student understanding, that—I hope—you're working on as a district or network or school in a strategic and thoughtful way.)
Why can't you both support teachers in adjusting grading practices, over time, to a more modern and robust approach that involves thinking deeply about students' understanding and also develop an action plan and supports that addresses the need to improve content knowledge?
What's the real barrier, here? Because this seems like you're putting up a wall on something that will definitely improve student outcomes and teacher engagement but keeping some other concern in reserve.
Why do you have such a low opinion of your teaching colleagues? Would you want your teachers to exhibit such a low opinion, analogously, of their students? Are you aware of the research on growth mindset and asset-oriented beliefs, which are as important in organizations with adults as they are with students?

Another concern: Isn't this approach less accurate? When looking holistically at students' work, when thinking about the progress in understanding and skill of the classroom (rather than individual students), doesn't this mean that such an approach will be "less accurate" than traditional approaches?

Further, if you use this approach in a formative assessment—and don't, as the research on formative assessment suggests, evaluate students' work for grades—then how will students be "motivated" to do the work? Don't they need the "fear of failure" to inspire them to show their work and to try their best?

Both of these points sadden me. And I think they are just manufactured distractions away from practices that, quite simply, we know will benefit students (and teachers). We need to take a step back:

What's the purpose of mathematics education?
What's the purpose of education?
What's the purpose of evaluating or grading?

And, I would argue, if you think about these questions—if you really think about them—then these concerns can, and should, fade far into the background.

I could write an entire book on this point, but I don't have the time, or space, or energy to do so here: we need to realize that current trends in education reinforce a compliance mindset, rather than a learning mindset.

And accuracy in "grading" simply isn't a thing—it's a false construct, culturally-construed. If nothing else, I hope that the anecdote about Sean demonstrates that what's considered "right" and "wrong" is highly dependent on how students' think, and therefore limited notions of accuracy is simply a false idea that holds us all back.

I would argue, in fact, that having a richer and more nuanced understanding of students' thinking—reflected in the two-dimensional approach to evaluation described here—is actually much more accurate in evaluating the complexity of mathematical work than typical approaches.

Finally, change—especially if it's important enough—takes time and investment. Teachers and school (or district) leaders worry about the length of time and investment it might take to see changes in their buildings and classrooms.

Well, isn't anything worth doing also worth taking the time to get right?

And...if it doesn't challenge you, it won't change you, right? (Paraphrasing sentiments attributed to those such as Fred DeVito, Socrates, Deepak Chopra, and others.)

I know you don't have any reason to trust me, necessarily, but all I can say in response to the legitimate concern (about the time horizon and effort to shift to a newer way of doing things) is that I've seen the transition happen successfully, even in large and complex school districts, with optimism and persistence.

The fact remains that anything new, like starting a new routine around nutrition or fitness or getting a new pet, takes time and effort for adjustment. And, a corollary to this fact, is that humans are remarkably adaptable if we don't give up. The concerns and the effort reduce, as time goes on, if we build new habits that replace old habits.

We discover efficiencies.

We realize that our objections or concerns weren't as well-founded, or as large, as we may have thought.

TL;DR

As usual, please write to me, because I welcome your thoughts.

Need Help?

I am ready, willing, and able to help your school or system adopt these research-based changes to your assessment practices. If you want to see deep and meaningful change, including renewed excitement about mathematics instruction, in your building(s), please contact me for a consultation.

*See here, and here, and here. The last link is one of my favorite examples in mathematics, describing an operation known as tetration or superexponentiation or serial exponentiation. Interestingly, there is choice in the direction of evaluation; see "serial exponentiation," here, for more information.

Page updated

Report abuse

Up-to-date thoughts from our newsletter. Please share!

Blog

Table of Contents

Startup Guide: How to Change School Assessment Practices & Why