As this project was about using Python, it only makes sense to walk through the code in a little detail. (Note: code screenshots are from a slightly earlier version, but the structure of the code is the same.) All final code, source csvs and results can be found on GitHub.
With over 70,000 Census tracts across the country, the Public Database csv is massive.
The CDBG csv isn't much better, though indeed more focused.
To create discrete plots for each state (plus Puerto Rico and DC), a list must be established and a loop counter set to run through.
This same loop counter is then used throughout the script to name files, label the plots and post status reports through terminal.
But first a couple of functions are defined for future use.
A series of for loops and if statements compares the two massive csv files to find corresponding tracts and begin writing a new csv for each state — one with only the necessary columns.
Fifty-two more manageable csvs are generated through this process.
With the new, combined csv generated, the Matplolib and Seaborn libraries are used to generate and stylize the plot, defining everything from the plot size (line 69) to the marker transparency (line 77).
To allow for multiple uses, each figure is then saved out as both a pdf and a png (lines 87 and 88).
While the script runs (and thanks to line 90), terminal notes when a state is done. It also notes which lines of the csv were skipped, due to data glitches. (Because of order, the same lines get listed as each state is processed, but they are only 22 out of 73,000+.)
As seen on the "Each State" page, the scatter plot for each state is distinctly labeled.
To compare Census tracts for All States, the script is similar, but of course doesn't require the loop counter.
The same comparisons and field defining is done.
A lengthy csv is created, but still pared down to relevant columns.
Similar parameters are established through Matplotlib and Seaborn.
And the result is similar, if larger to allow for all Census tracts.
Due to how long it takes to compare the csv files (approximately six hours each for the Each State and All State processes), I wrote additional code to create visualization refinement or options, once the csvs had already been generated. (Each of these takes under five minutes to create just the visualizations.)
Shown below is the code and results for four size explorations of All States.