Up-to-date thoughts from our newsletter. Please share!

Blog

Benchmark Meeting

by Joshua A. Taton, Ph.D. | July 24, 2023 | 3 min read

I was stunned.

“What does the data tell us? What questions come up for you?” I had asked. In retrospect, I should have been prepared for what came next.

I had just walked through a roadmap of how to review and interpret our district's quarterly benchmark-test data. There were several dozen hard-working, dedicated school principals in the room.

My team and I had just spent several months of around-the-clock work, updating the district's mathematics benchmarks. I wasn't convinced by the superintendent's charge that such tests were needed, but—if they were—they should be useful for teachers (and students).

We rewrote them—for all non-elective mathematics classes from Kindergarten to Grade 12 and for all four curriculum programs in the district—to ensure they were: a) shorter and more accessible to students with varying learning needs, b) closely aligned with the content expected to be taught, and c) contained the most commonly-experienced misconceptions in the classroom, so that teachers could help refine students' thinking.

Now, as we were rolling them out, I was orienting principals to the changes and, specifically, how they could support teachers in understanding and using the new reporting system with the new tests.

After the introduction—discussing the changes, the purposes of the tests, and how they could be meaningfully interpreted—I shared a sample set of anonymized school-level data with the attendees of the meeting. Then I asked the questions that turned out to be unexpected bombshells.

One principal, consistently an active participant in district meetings, jumped right in. Pointing to a chart on the screen, he said, "I want to know why this third-grade teacher is performing so much worse than the other teachers. I would plan to spend a lot of time in that classroom." He emphasized the words "lot of time" in such a way that the other principals nearby gave knowing glances to each other. Half-smirks.

Now, I respect and admire principals. It's a tough, tough job. And many of them do it well. Many of them are simply unprepared for the daily onslaught of problems they face working with the district office, community members, teachers, building engineers, and so on.

I should have known the conversation might go in this direction, though, as a few—a rare few—have the default stance of being somewhat antagonistic toward teachers.

Truth be told, I understood where those comments were coming from. Though misplaced, I believe they were born out of an earnest desire to help teachers refine their craft.

This was something I had been grappling with, too—and for quite some time. Even more, I began wondering—feverishly in that very moment—why we were even showing and looking at school-level (or cross-classroom) data.

My palms grew sweaty. And I'm certain my brow furrowed. Try as I did to prevent it.

I knew I had to respond, and I knew that I had to stay true to my own expertise and beliefs, to the guiding framework our team had worked hard to establish, and to the teachers whom I pledged to support.

I reminded the group of several points that we had already discussed:

These were brand-new tests with new problems that students and teachers hadn't before seen
We didn't yet know how difficult the tests—for a given grade—were, relative to those in other grades
We, as mathematics educators, know that some content is inherently more complex than other content; for instance:
- In third grade, students begin dealing with multiplication and division after spending years studying addition and subtraction, and
- In fifth grade, students begin studying fractions (both of these represent huge, fundamental, cognitive shifts)
Different schools were using different (and new) curriculum programs that covered content in different ways (the impact of which had yet to be felt)
The purposes of these benchmark tests were only the following:
- To help us understand, on a broad, district-wide scale, the impact of our new curriculum programs at various checkpoints over the course of the year
- To see whether teachers were able to cover—or were covering—the expected amount of content in a given academic quarter (part of the overall curriculum mandates from our state)
- To help teachers identify the most-common misconceptions students were making, so that they could address those misconceptions in the short term.

Never did I say, nor mean to imply, that the test results could or should be used to compare teachers.

And certainly not for punitive, accountability-oriented reasons.

But that was where a few attendees in the room wanted to go. Despite the fact that they also knew, quite well, that the makeup of different classrooms was different (some classrooms had more students with IEPs or English-language learning needs, for instance) and even that some classrooms were located on the "hot" side of the building.

In other words, an apples-to-apples comparison—for any number of reasons—could never be reliably made.

Statisticians or psychometricians would say that such conditions prevent an "experimental design" or a "response-control treatment" study, which negates any such comparisons one might try to make.

Many of these principals, I knew, had also been teachers who themselves had been troubled by unfair comparisons made—and accountability pressures imposed—by their school and their district leaders.

I'm not sure that I navigated the ensuing discussion well. I'm also not sure, looking back, there was much I could have done to prevent the conversation from going in that direction.

I know that I wanted, very much, to discuss the charts as entry points for asking questions. For principals to want to look, more deeply, at the nature of the content on the benchmark tests, in the curriculum, and in the classroom, and to ask questions about how they could support teachers in reinforcing any big ideas that remained lingering.

I'm writing about this experience, here and now, to call attention to the need for us all to do better. To think more carefully and critically about how we use data—the intended purposes of tests and results and the unintended consequences of their misuse.

Why is it that—even when we all know better—we can't resist the pull of unhelpful practices, even those that we know have the potential to cause harm? Why are there cultural norms for data analysis within the practice of education—norms that would never be, and should never be, endorsed by statisticians or other measurement professionals?

I welcome your thoughts.

Page updated

Report abuse

Up-to-date thoughts from our newsletter. Please share!

Blog

Table of Contents

Benchmark Meeting