City Directories
City Directories
I am currently working on the digitization of German city directories (Adressbücher) using Optical Character Recognition (OCR) to speed this cumbersome process up. See Albers and Kappner (2022) for a discussion. City directories provide us with personal information (name, occupation, exact address, telephone number, floor number for a few city directories) about the household head, as well as business/amenity locations within the city over 200 years at a nearly annual frequency. Occupations can be mapped to income via HISCAM scores (Lambert et al (2013)).
You can find the excel file for Düsseldorf 1891, which apart from the current OCR/clean-up pipeline, took about 30 minutes of manual corrections of flagged errors here (0.9mb, done in May 2025) not counting here the 20 minutes of OCR that can be done in the background. Currently, the character error rate achieved for Fraktur is approx. 1 error per page (approx. 0.03% CER), with current post-processing able to achieve 0.01%. This is lower than some reliable, not overexaggerated estimates for manual entry of 0.13% of non-Fraktur sources (see the great work/slides by Lin, Moulton, Rand and Smith page 48). However, note that the error rates for Fraktur typeset is about a magnitude higher due to common issues by OCR to distinguish ligatures (u vs n, f vs s, c vs e) which can be partially corrected via post-processing.