Statewide District Analysis for Equity and Need
Last Updated - June 2024
PURPOSE AND KEY QUESTIONS FOR UNDERSTANDING EQUITY
Purpose and Overview
DATA OVERVIEW: The statewide analysis highlights a number of key equity and need metrics already in use by governmental agencies as they relate to TK-12 schools and environmental justice. The purpose of including these metrics in this data initiative is to support education leaders and advocacy organizations to take an equity-driven approach to taking environmental and climate action to scale.
Key Questions
CORE QUESTIONS:
Which school districts are experiencing the highest inequities according to traditional equity indicators for TK–12 schools (i.e socioeconomics, race, special education, english language learners, etc.)?
Which school districts are experiencing the highest inequities as it relates to environmental pollution?
ADDITIONAL QUESTION: Which school districts are experiencing the highest inequities AND least amount of access to environmental and climate action?
Data Methodology
Resources and Data Collection: The data collection process involved various methods. Specifically, CDE codes, school district listings, physical addresses, geographic coordinates, district classifications, ADA expenses, rates of unduplicated student populations, and the proportion of students eligible for FRL were sourced from California Department of Education (CDE) Public Schools and Districts data files, as well as the EdData website. The process was automated using for-loops and similar strategies, resulting in an array or table containing the calculated values for subsequent reporting and analysis.
Data Cleaning: The process involved downloading Excel files and analyzing them using Python in Google Colab, mainly with the Pandas package. The analysis combined geospatial techniques in QGIS and data manipulation in Google Colab. It started with collecting shapefiles, pollution data from CalEnviroScreen, school locations from HIFLD, and census data. In QGIS, California schools were selected based on their location, including manual adjustments near water bodies. This was followed by a spatial join with pollution data. In Google Colab, school data was further refined by adding district names, cleaning the data, and grouping it by district. Finally, average pollution scores and percentiles for each district were calculated using Numpy's np.mean function.
Data Team: This data set was originally compiled by a team of University of California at Berkeley (UCB) Data Discovery Interns.
DATA VISUALS AND METRICS EXPLANATIONS
Visuals Overview
![](https://www.google.com/images/icons/product/drive-32.png)
This video will show how to use the interactive features in the following visualizations (map and graph).
The map below includes data for school districts across the state of California, and pulls from a number of data sources including Ed-Data and CalEnviroScreen. The data in focus for this section includes LCFF unduplicated pupil counts as well as CalEnviroScreen pollution and population characteristic data. To learn more about these metrics see explanations below the map.
DATA FILTERING AND COLOR KEY:
Filtering: Use the filter drop downs and check boxes to explore different aspects of the data and different geographical regions.
Hovering for More Data: Hover over a district to see other data such as district enrollment, and expense ADA (the amount this district spends per pupil).
Color Key: In general, lower scores are indicated with darker green, and higher scores are indicated with red.
Metrics Explanations
Already Existing Metrics and Data:
LCFF Unduplicated Percentile: A metric ranging from 0 - 100 that captures student needs in a local school district. This is the district's Local Control Funding Formula unduplicated percentage, or the percentage of students that fall into at least one of these categories: a) low-income, b) foster youth, or c) English learners. Learn more about LCFF and Unduplicated pupil counts here.
CalEnviroScreen (CES) Data: The CalEnviroScreen data provided in this data set was calculated through a geospatial join of CalEnviroScreen's census tract data with district boundary data.
For a given census tract, scores for the Pollution Burden and Population Characteristics are calculated as described below. These descriptions are directly from the CalEnviroScreen 4.0 Report
Pollution Burden Score: The percentiles for all the individual indicators in a component are averaged. This becomes the score for that component (see image below for indicators). When combining the Exposures and Environmental Effects components, the Environmental Effects score was weighted half as much as the Exposures score. This was done because the contribution to possible pollutant burden from the Environmental Effects component was considered to be less than those from sources in the Exposures component.
Population Characteristics Score: The Population Characteristics score is the average of the Sensitive Population score and Socioeconomic Factors score. (See image below for the indicators)
CES 4.0 Score: The Pollution Burden and Population Characteristics scores are then scaled so that they have a maximum value of 10 and a possible range of 0 to 10. A value of zero typically implies that monitoring or reporting was conducted, but no impacts were present. Each average was divided by the maximum value observed in the state and then multiplied by 10. The scaling ensures that the pollution component and population component contribute equally to the overall CalEnviroScreen score. The overall CalEnviroScreen score is calculated by multiplying the Pollution Burden and Population Characteristics scores. Since each group has a maximum score of 10, the maximum CalEnviroScreen Score is 100.
Learn more about the CalEnviroScreen data in the CalEnviroScreen 4.0 Report
Learn more about the indicators included in this initiative in the Glossary of Indicators.
The graph below shows the average CalEnviroScreen percentile scores compared to the average percent of unduplicated students in each county. In the graph, pollution burden is on the y-axis and population characteristics are on the x-axis. Each circle represents a county with the size representing the % unduplicated and the color representing the CES percentile.
The counties in the upper right quadrant are above average in each of the different listed equity factors, indicating a greater overall equity impact than other areas in California.
Key Takeaways
While each district has its own equity factors to consider, there are some trends that show areas of higher need throughout our state.
California has a wide distribution of unduplicated students across all regions in the state.
Pollution burden tends to be higher in the central valley and greater Los Angeles areas.
Sensitive populations and socioeconomic issues are spread more widely throughout the state with the highest scores in the central valley down through southern California. Unlike pollution burden, there are higher population characteristic scores in the north, far-north regions of California as well.
When looking at all of the factors together, there are a few counties that have high percentages of each equity indicator (such as Tulare, Kings, Fresno, and Merced). These counties could potentially use additional funding and support to address pollution burden and equity gaps.