Data & Process

(1) We began by sourcing digitized Brooklyn City directories from Internet Archive that had full-text transcriptions with acceptable OCR.

(2) We then parsed out the brewers and distillers from our three chosen directories (see About) using Python. The script takes in the text file of the directory and outputs a CSV. The code is commented to explain this process, and is available in the GitHub repository linked to at left.

(3) Once we had three CSV files containing names and addresses of all distillers and brewers, we then determined what cleaning needed to be done and how much of it could be done with Python. Whatever couldn't practically be done with Python we did manually. This included: fixing various abbreviated street names such as B'wick, correcting misspellings due to OCR errors, and creating separate entries for businesses in cases when they appear on the same line as home addresses.

(4) After cleaning the CSV files, we uploaded them into Google sheets and used the add-on "Geocode by Awesome Table" to produce the latitude and longitude of each address. We then imported the geocoded addresses into Google My Maps.


NOTE: The Python code available in this project's GitHub repository can be easily modified to parse other data from city directories by changing the regular expression patterns.