The city of Hayward is located in the San Francisco Bay area, and is home to Cal State East Bay and over 159,000 people. Following a conversation with the leaders of the Hayward Promise Neighborhood grant, this work analyses the standardized test scores in Hayward Unified School District (HUSD). We find districts comparable to HUSD and investigate the differences in test score outcomes. We also attempt to understand what characteristics of a school district most closely correlate with test score outcomes. To conclude, we take a step back and ask whether test scores are the best tool for evaluating a student’s ability to perform well in school.
In this study we use data from the California Department of Education on test score outcomes for the Smarter Balanced Summative Assessments (SBSA). According to the Department of Education, the Smarter Balanced Summative Assessments in mathematics are aligned with the Common Core State Standards (CCSS) and measure progress toward college and career readiness. The tests capitalize on the strengths of computer adaptive testing—efficient and precise measurement across the full range of achievement and the timely turnaround of results. [1]
Data on these test scores shows how students in each district, and each school in a district, perform on SBSA math exams compared to a preset standard or goal. Data is also disaggregated by race, English learner status, socioeconomically disadvantaged status, and disability status.
Our first aim was to find school districts which are comparable to HUSD. Finding similar districts was an effort to have fair comparisons of test scores. We used the following 4 metrics to determine likeness to HUSD:
Spending per average day of attendance
Population size
Diversity profile
Median household income
Using a Jupyter Notebook, we compared school districts in California to HUSD and determined what districts are within the following bounds on the 4 metrics above:
+- $100 for spending per average day of attendance
+- 800 students for population size
+- 5% per ethnicity in diversity profile
Comparisons along these 4 metrics yield the following comparable school districts; see the map to the right:
Fresno Unified School District*
West Contra Costa School District
Banning Unified School District
San Lorenzo Unified School District
*We note that the Fresno Unified School District meets the bounds on population size, diversity profile, and spending but not on median household income.
Based on our metrics, we examine school’s similar to those in HUSD. While looking at these scores, we also wanted to acknowledge and understand the opportunity gaps among demographic groups. This will also be reflected in our charts. We selected to look closely at five demographic groups: African-American, Asian-American, Hispanic/Latinx, Students with Disabilities, and White.
The boxplots below indicate how far away student scores are from the standard, or goal, score. On the vertical axis, 0 represents a distance of 0 from the goal score. If the average line (orange) is below 0 (negative), the student scores on average do not meet the goal, whereas if the average line is above 0 (positive), the goal was surpassed.
Brief explanation of Box plots
We include now a brief explanation of how to read a boxplot. Boxplots are depicted as in the image below.
Below we describe the components labeled on the box plot above [2].
Minimum Score -- the lowest score, excluding outliers (shown at the end of the left whisker).
Lower Quartile-- twenty-five percent of scores fall below the lower quartile value (also known as the first quartile).
Median-- the median marks the midpoint of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value and half are less.
Upper Quartile-- seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value.
Maximum Score-- the highest score, excluding outliers (shown at the end of the right whisker).
Whiskers-- the upper and lower whiskers represent scores outside the middle 50% (i.e. the lower 25% of scores and the upper 25% of scores).
The Interquartile Range (or IQR) -- this is the box plot showing the middle 50% of scores (i.e., the range between the 25th and 75th percentile).
Using data from the California Department of Education and a Python Jupyter Notebook, we created the following box plots for Hayward Unified as well as the comparable school districts listed above.
The charts throughout this article use abbreviations for various demographic groups. These are the abbreviations used by the California Department of Education, and we have adopted them here.
AA -- Black or African American
AS -- Asian or Pacific Islander
HI -- Hispanic/Latinx
SWD -- Students with disabilities
WH -- White
Some Observations based on the boxplots above:
The test scores of African-American and Hispanic students, or students with disabilities, fall below the scores of White and American-Asian students in each of these districts.
In all but one district, the median score for every demographic group falls below the goal score.
Hayward Unified School District scores match or exceed those of these comparable school districts.
Some anomalies in the box plots occur due to a low sample size or the spread of the data values. For example, when creating the box plot for African-American students in Banning Unified School District, we are unable to see the full 4 quartiles of data. As a simplified example, to the right are the box plots generated for the numbers 2.8, 2.9, 3, 6, 7 (Example 1) and for the numbers 1, 2, 3, 6, 7 (Example 2). One can see how the spread of the data values can affect the shape of the box plots.
We note that consistently students with disabilities score lower than other demographic groups. This points to a need for increased services and resources for students with disabilities. This also begs the question of whether standardized testing is the best way to measure the learning and growth of students with disabilities.
In West Contra Costa Unified, the most outliers are found in the students with disabilities, Hispanic students, and Asian students.
Once we examined the box plots for scores in Hayward and comparable districts, we delved deeper to try to identify what factors correlate with higher test scores. We choose to look at
median household income,
population size of a school district, and
funding per average day of attendance (per ada).
When looking at the median household income by racial demographic groups, we see similar incomes for families within these school districts; see the chart below. We note that the median income in Fresno is significantly lower than of the other districts which are found in the Bay Area. Our emphasis here is the incomes of racial groups within a district. When looking at test scores, the distribution of scores follows the distribution of median income. Groups with higher scores also have higher median household income. The exception is Fresno Unified which shows test score distributions similar to the other districts, but median household income distribution among racial groups differs in that the median income for Asian/Pacific-Islander homes is lower than both the Hispanic/Latinx and White households.
When filtering school districts according to student population size only, we find the districts shown in the table above. The table shows the student population in thousands as well as the average math score for the district. It seems that there is very little correlation between population size and mean performance. Finally, below we see a chart with the demographic ratios and spending per average day of attendance (per ada) in thousands for Hayward Unified and the 4 comparable school districts.
County Totals from Least to Greatest [5]:
Solano - $76,609
Sonoma - $76, 753
Napa - $84,753
Alameda - $92,574
Contra Costa - $73,712
San Francisco - $104,552
Marin - $110,217
San Mateo - $113,776
Santa Clara - $116,178
We now look at each of the nine Bay Area counties a bit more closely. Santa Clara leads with a median income of $116,178, followed by San Mateo at $113,776, and in third, Marin with $110,217.
Now knowing the general median household income for each county, how can that reflect in their test performance by county? In order to get a closer look at the opportunity gaps among test scores, we generated box plots for different demographic groups (African-American, Hispanic, Asian-American, Students with Disabilities, and White students) within each county. We include here box plots for not only standardized math tests, but also scores for standardized english tests.
In statistics, correlation is a way to describe the relationship between two variables. This relationship is expressed as being ‘positive’ or ‘negative’. When variables increase together, they are said to be ‘positively correlated’. When one variable increases while the other decreases, they are said to be ‘negatively correlated’. For example, as a child grows in height, so does their shirt size. Their shirt size and height here would be positively correlated.
Below we give the correlation between average test scores at HUSD and household income, spending per ada, population size, and diversity profiles.
Correlations
Household income: about 0.97
Spending: about 0.54
Population Size: about 0.47
Diversity: about -0.12
Using the correlation values between the metrics and test scores, as well as the charts produced by metric, we came to the conclusion that:
Diversity has a ‘weak’ correlation.
When looking at factors such as population size and school spending, we can assess that these factors have a moderate impact on school testing scores.
The correlation between median household income and test scores is strong.
After carefully analyzing the test scores for HUSD and Bay Area counties in general, we examined why we do standardized testing in the first place.
Standardized tests weren’t always used to measure student learning. Following a large increase in population in the US from 1820-1860, standardized tests became more popular, as school systems became more complex. As population increased rapidly, the US shifted from schooling via private tutors and private schools, to free universal primary education. This was seen as an attempt to have fairness and equal opportunity for those in the US. Standardized tests were created to reach some sort of fairness level, despite the fact that each student grasps concepts/ideas differently, and processes/obtains information in different ways. Factors such as income, location, race, etc. can play a big role in these results; see [7].
We conclude that there are no other ways of measuring or collecting data on student learning at large scales. Testing is the most well established scalable option, but we should consider whether it is the best option or not. Between kindergarten and grade 12, we take an average of 112 standardized tests. This means that for each of the tests taken, teachers put their curriculum aside for test preparation, which sometimes takes weeks. Additionally, Standardized tests are used to set a standard that may be unrealistic to the students. Keeping in mind every student learns and grasps information differently, this can cause problems when setting a universal standard. In this era, there has been a set standard to which one student needs to achieve, but not everyone has access to the same tools to get to that level in the establish period of time. Tests don’t really teach anything, teachers do.
[1] CA Department of Education; Smarter Balanced Summative Assessments (2021) https://www.cde.ca.gov/ta/tg/sa/sbacsummative.asp
[2] Simply Psychology-- boxplots, (2019) https://www.simplypsychology.org/boxplots.html
[3] National Education Association (2020) https://www.nea.org/resource-library/essa-and-testing
[4] Whitby School https://www.whitbyschool.org/passionforlearning/the-pros-and-cons-of-standardizEd-testing
[5] U.S Census Bureau 2010. https://www.census.gov/quickfacts/fact/table/US/PST045219
[6] Python Data Science. https://www.pythonfordatascience.org/variance-covariance-correlation/
[7] Lessons From the Past: A History of Educational Testing in the United States. https://www.princeton.edu/~ota/disk1/1992/9236/923606.PDF