Data

Explore the uses, shortcomings, and critique of the data used in this project.

Overview of the Dataset

We selected the Food Access Research Atlas, which is based on 2010 census boundaries and population numbers and data from 2019 collected by the U.S. Department of Agriculture’s Economic Research Service. This dataset provides an overview of food access indicators for low-income and other factors using various measures of supermarket accessibility. These indicators include the distance to the nearest store and the number of stores in an area, individual-level resources like family income and vehicle availability, and neighborhood-level resources such as the average income of the neighborhood and the availability of public transportation. We use this data to visualize the state of food insecurity in California and analyze disparities across racial and age groups.

Uses of the Data

The granularity of the data is on the census tract level: each row of this national dataset represents a census tract in each county in every state. There are 148 fields for each row, notably including census tract, state, population, poverty rate, whether the tract is considered urban or not, and low food access for combinations of distance, race, age, vehicle access, and SNAP/EBT participation. There is a detailed variable lookup in the data set that provides longer names and descriptions for each field which offers us a more comprehensive understanding of the data, without having to make assumptions about what we filtered and grouped by during the data analysis process. The data also offered the ability to analyze data and summarize results at a statewide, county-wide, and tract level. In our project, we extracted the California average of people who have access to a grocery store within half a mile, looked at poverty and food access rates from a county-level, and did analysis on tracts in Los Angeles specifically after finding it as a metropolitan area with major disparities across race.

Shortcomings of the Data

Beginning with the state of the data, one obvious shortcoming is the age of the dataset which was first released in April 2021 but uses census data from 2010 and food access data from 2019, making the content of the data at least five years old. Since then, the pandemic has altered what food access looks like and we are unfortunately not able to show that in our data presentation. Thus, we opted to use other visual elements, specifically our photo essay that highlight pandemic-induced food insecurity around California, to accommodate for this shortcoming. In our findings on how various racial groups are affected by food accessibility, the data shows that different distances impact food accessibility similarly across racial groups. The primary difference is that the number of people decreases as the distance to food sources increases. Given that the data only covers tracts, we had to make our own decisions on how to get results at a city, county, or state, using average. This approach also means that we cannot pinpoint which specific areas or neighborhoods that suffer the most from low food accessibility – and thus we often had to keep the scale of how we presented the data larger. Missing and null values for many fields for some census tracts also made the averaging and summing of values to get to the county or state-level more difficult, and we had to adopt our own approaches as documented in the next section. For many audiences, a general overview may be less informative than detailed, actionable insights for particular locations. Moreover, because grocery stores are not the only way of getting food, those who lived beyond the radius presented in the data would be categorized as low access even if they grew adequate amounts of food for themselves or through more informal methods of exchange (i.e. seasonal farmers markets).

Data Methodologies

For the exploratory data analysis, we used Excel and Google Sheets pivot tables, filtering, and grouping functions as we found it the most intuitive and easy to share the work between team members. All of the data fields (excluding state, county) are numerical so we did not see a need to use a tool like Jupyther notebook and the pandas library for textual analysis. For analysis that required averaging, we grouped by country or state, ignoring rows with missing values for the field we were investigating. Our justification is that there were usually only a few instances of missing values and we did not want that to deter us from finding aggregate, interpretable information about the data. We recognise this is some level of data cleaning and make this process transparent on our site under the visualization we present.

Ontology & Guiding Frameworks

The decisions that went into the ontological creation of data and separation of individuals (represented by population numbers) was dictated by the 2020 census tract boundaries. Unfortunately, as a 2022 Brookings Institute piece finds, the census has historically undercounted racial and ethnic minorities, particularly Black, Latino, and American Indian/Alaskan Native populations (Sanchez). This is a symptom of the systemic power imbalance that exists at the government level who are funding, collecting, and producing this survey of the American population — fundamentally overlooking the nuances of these underserved communities that are both missed and inadequately captured.

Using critical race theory and intersectionality as our guiding frameworks, we aimed to represent various racial groups and analyze their experiences with food insecurity in California. We tracked different racial groups and ensured our analysis was unbiased by utilizing averages and visualizing the share of the population each group represents in California. This approach provides context, helping to explain why certain racial groups might have higher rates of low access to food compared to others. In the interactive graph, we presented food access on a county-level and utilized the hover and tool-tip features to include context about the poverty rate and number of housing units that receive SNAP benefits for context around the food access level, utilizing our learnings about Marxism on issues of wealth, class, and power to frame our data narrative.

How did we summarize results from data?

On a statewide level, we found only little over half of Californians have access to a grocery store within a half mile distance from their home. In counties with low food access (<20 percent of population within half mile distance), we noticed higher rates of poverty. For the aggregate data, on a strictly numerical basis, the higher the population of a particular racial group, the greater the rates of low food accessibility within that group. However, in many places, looking at these numbers compared to the percentage of racial groups in California revealed disparities among racial and class lines, which are more deeply represented on the Visualizations page of the site. Since the main visual feature is the interactive map and visualizations, we chose to prioritize accessible and interpretable design into these components for users to create their own conclusions specific to their reason of visiting the site rather than static descriptions.

Page updated

Report abuse