Dr Adam Funk, Information School, University of Sheffield
The US National Archive has digital images of about 25,000 muster sheets (attendance rolls) from vessels in the Union Navy during the American Civil War. These are handwritten on printed tabular forms, and historians would like transcribed, searchable versions.
This talk discusses our work in the Civil War Bluejackets project, in which we are combining citizen science (manual annotation and correction through Zooniverse), machine learning (for handwritten text recognition), and image processing (to adjust the forms and determine the locations of the rows and columns) in to transcribe the sheets automatically at scale.
We can then use the names and other personal information on the sheets to link them to existing databases such as digitized veterans' pension records; and to study the sailors' ethnicity, social class, and national origin (the sheets include physical appearance, occupation outside of the navy, and place of birth).