Working with Race/Ethnicity Data
Working with race/ethnicity data is an exercise in conscious, intentional design. Like with almost all design decisions, there are few hard and fast rules that should be followed - instead you should constantly consider the experience of the audience and the messages your visualizations are explicitly and implicitly conveying. This exercise will cover a few of the considerations you can make when analyzing and visualizing data.
Prompt
You have been asked to look at race/ethnicity enrollment trends over time for emergency shelter enrollments (original data here). The data appears as below but first attempts at telling a story is proving difficult.
Establish the Dimensional Framework of Race and Ethnicity
Using race and ethnicity data to drive decision-making requires your audience has a clear understanding of the structure of the data and the analytical lenses you can and can't apply because of the structure. HUD's universal data elements break race and ethnicity into two dimensions - before you can dive into an analysis is important to establish for your audience what you client population looks like across those two dimensions.
To understand our client base here, we will use groups, highlight tables and table calculations.
To start, create a simple table with Clients Race on rows, Clients Ethnicity on columns, and count-distinct of Clients Unique Identifier as text.
To make the table a little easier to read, let's group the categories where there is no usable data - "Client Doesn't Know", "Client Refused", and "Data Not Collected". Starting with the Client Race dimension, control-click those categories to select them, right-click your selection, and select Group. You'll see that a new dimension with a paperclip is generated and those three categories are grouped together.
Right click the new dimension and Edit Group. Rename the grouped category as "No Data Available" to simplify the title.
Repeat this process for the Client Ethnicity dimension. We now have "No Data Available" as a category for both Client Race and Client Ethnicity.
At this point, you could make a strategic decision to continue to include or exclude the No Data Available categories. Looking at this data, excluding those categories would remove 28 clients from the total population (4,601 -> 4,573). For purposes of this exercise we will make the decision to exclude these clients since they represent less than one percent of the total population and add a caption noting this exclusion.
To create a caption, go to Worksheet - Show Caption, and edit the text box under the table.
We now have a table that's easier to read on its face. That said, it's difficult for humans to really make sense of all of these numbers. To place those numbers in context with each other, we'll use highlight tables and table calculations.
It will be helpful to add row and column grand totals to this table to see how the counts change as we go. In the menu bar, click Analysis - Totals - Show Row Grand Totals and then the same for Show Column Grand Totals.
To easily create a highlight table, use the third option in Show Me. You may have to flip the columns and rows and re-add the grand totals. Already, the table is easier to make sense of.
Let's use table calculations to understand this data from different angles. First, make a duplicate sheet to not lose the original table.
On the new table, right-click CNTD(Clients Unique Identifier) with the text icon and select Quick Table Calculations - Percent of Total. Note the delta symbol on the text pill indicating a table calculation.
Then, either do the same for the CNTD(Clients Unique Identifier) with the color icon or control-click and drag the CNTD(Clients Unique Identifier) pill from text onto the color pill, replacing the original pill with the table calculation.
What do these numbers represent? What sticks out to you in these values?
Duplicate this sheet and on our new sheet, right-click the text pill again. This time, select Compute Using - Table (down). Again, make sure to repeat the process or copy over to the color pill. What changed? What do these numbers represent? What sticks out?
Duplicate this tab one more time and right-click the text pill again. This last time, select Compute Using - Table. Repeat the process for the other pill. What do these numbers represent?
To finish up, let's put all of these tables on a single dashboard.
Each of these graphs provides a different angle on the same data but helps the end user gain a more nuanced understanding of the client population. Beyond understanding client populations, this simple display of populations across race and ethnicity dimensions helps your audience understand the intersections and limitations of the two-question approach that HUD uses. From this point, you can start to visualize more complex visualizations.
Leveraging Interactivity to Highlight Populations
Now that the dimensions of race and ethnicity have been introduced and established, you can open up analyses of this data through interaction. Below is an example of a quick dashboard that allows the audience to view trends in enrollment over time.
This dashboard uses action filters, and dynamic dimensions controlled by a parameter. To learn how to create those functions, view the How Design Influences Insight exercise.
The combination of the dynamic dimension and action filter allows the end user to focus on only the populations relevant to the conversation at hand including specific intersections of racial/ethnic identity. For example, this dashboard can just as easily look at Hispanic vs. Non-Hispanic populations in one view and just Non-Hispanic White vs. Non-Hispanic Black clients in another.
Other Considerations
Visualizing race/ethnicity data requires careful attention to the narratives those visuals are constructing. Beyond categorization and disentangling line charts, consider some of these other decisions:
Be conscious of the colors you use to represent populations. Aim to avoid stereotypical gendered (blue for boys, pink for girls) or racialized colors. If your organization has a standard color palette, use those colors and make sure to use a legend.
Be conscious of the order of populations within a bar chart. There should be a conscious design decision about which populations go first, second, last, etc. Sorting by most populous can make sense if that is an important aspect of the narrative but be aware that the larger populations are now at the top and will garner more attention.
Find ways to humanize your data. One way to humanize aggregated client data is to create an action filter that pulls up the individuals that are represented by the mark. View the interactivity exercise to learn more about action filters.