Up-to-date thoughts from our newsletter. Please share!

Blog

Table of Contents

Three Types of Assessment Bias 

by Joshua A. Taton, Ph.D. | September 7, 2023 | 4 min read

This post was featured by EdLight here.

The following episode occurred at a meeting of a state committee in which I participated along with about 8 other educators. The committee's charge involved reviewing questions from the state standardized test that had been field-tested with a random sample of students in the state. We had been looking at the field-test statistics and had encountered a question on which students of color were performing significantly worse—obtaining a much lower percentage of deemed-correct responses—than their peers.

The teacher was perplexed.

"Can't you just tell me how much this question is biased? What is the number that shows how biased it is?" She was asking the statistician, who had been hired by the state to oversee the testing development and administration process.

We knew that the question performed differently for different groups. Students of color obtained a much lower percentage of deemed-correct answers than students from other demographic groups. Because the groups were composed of different numbers of students, though, we couldn't make a simple and definite determination that the question was, in fact, biased.

We were also a mostly-white group of educators, and we also couldn't identify, easily, the potential source of bias within the question.

The teacher's confusion was understandable. She wanted to know a metric, some quantitative result, that would suggest whether and to what degree this question was biased.

The statistician wasn't able to provide a clear-cut answer, of course, and so—with my modest understanding of psychometrics (the statistics of educational assessment)—I tried to help. The entire episode allowed me to realize that we use the term "bias" in unclear, slipshod ways, and that we don't often talk with other educators about what type of bias we mean.

There are several types of bias. In this post, I aim to unpack a few and explain why knowing the difference between them matters.

Statistical Bias

Statistical bias, also known as measurement bias, refers to a situation where an assessment systematically overestimates or underestimates a person's abilities or characteristics. This type of bias can occur due to various factors, such as flawed test design or ambiguity in the questions. It's important to note that statistical bias can affect anyone, regardless of their cultural background, as it's related to the inherent flaws in the assessment itself.

Imagine a multiple-choice test question that uses complex language or contains multiple double-negatives, making it difficult for test-takers to understand. In this case, the question may introduce statistical bias because it does not effectively measure what it's intended to assess, regardless of the test-takers' cultural background.

We had observed statistical bias—noting that two groups appeared to produce different responses to the question—but we didn't know more.

Cultural Bias in Written Questions

On the other hand, cultural bias in written questions is an issue somewhat distinct from statistical bias. (When cultural bias is found, it most likely will result in measurable statistical bias.)

Cultural bias arises when an assessment question contains elements that favor one specific cultural group over others. These elements may include language, references, or context that are more familiar to individuals from a particular cultural background. As a result, individuals from different cultural backgrounds may find it harder to answer the question correctly, not because of their abilities, but because the question assumes a certain cultural knowledge.

Let's consider an example: A history question that asks about a specific event in American history without providing any context or explanation. This question might disadvantage international students who may not be familiar with U.S. history, creating cultural bias in the assessment.

Identifying Statistical Bias and its Relationship to Cultural Bias

To determine if an assessment has statistical bias, regardless of whether the assessment exhibits cultural bias, statisticians compare the performance of different groups of test-takers who are supposed to be at the "same level." This means they're expected to have similar abilities on the test.

Why can't we simply compare the performance of two different groups, such as students of color and white students, rather than look at groups who are performing at the same level? Because there might be other factors, besides the assessment items themselves, that contribute to the different performances.

In fact, I discuss one of these reasons—systemic bias—later in this blog post.

Regardless, by looking at the scores of different groups of students who perform at approximately the same level (i.e., obtain the same or similar raw scores), statisticians can then identify if there are consistent differences that don't seem related to student's knowledge or skill on the test.

If such differences exist, it suggests that the test might not be fair for everyone. 

And here's where cultural bias comes into play: it's one possible type of statistical bias. When the performance gap between different culturally-identified groups can't be explained by differences in their knowledge or skill on the test, but seems to be linked to cultural factors, it's considered a form of statistical bias. This is why it's essential to examine assessments carefully to make sure they don't unintentionally favor or disadvantage people based on their cultural backgrounds.

And this is why the statistician couldn't provide the teacher with one single measure of statistical bias. Because statistical bias, in this case, involved a set of numbers that depended on both the differences in outcomes for white and non-white students as well as the "sameness" of outcomes regardless of group. The result, if any could have been easily provided, would be a table of values.

Systemic Bias

It's important to compare test-takers at the same level when assessing statistical bias because this approach helps us uncover potential unfairness in the assessment itself rather than external factors like systemic bias. Systemic bias refers to unequal access to educational resources or opportunities, which can affect performance on assessments. 

For instance, students of color may underperform on standardized tests, in general, due to lesser access to resources and opportunities for a comparable quality of education. However, this doesn't necessarily mean that the test questions themselves are statistically biased.

When we look at differences in performance across groups, we can't simply conclude that a given assessment question is statistically biased. The variation in performance may be due to systemic inequalities rather than the assessment's design.

By comparing test-takers at the same level, we isolate the assessment's impact, helping us identify questions that introduce bias regardless of external factors. This way, we can work towards making assessments fairer and more equitable for everyone, regardless of their background or circumstances.

TL;DR

Systemic biasor unequal access to resources and opportunities—can result in cultural bias on poorly-designed assessments. (Otherwise, equitable resources and experiences would result in equitable outcomes.) In turn, cultural bias can result in statistical bias idenfied on assessments. Statistical bias can occur for other reasons besides cultural bias, as well.

I welcome your thoughts.