Above: A brief overview of how Transkribus software helps libraries translate and digitize historical documents. Video in German, but closed caption translations are available.
What is it?
OCR technology recognizes handwriting within documents and translates it to text.
Handwriting recognition technologies have been a topic of research since the mid twentieth century, originating from the need to accurately recognize signatures in the corporate financial services sector (Muehlberger et. al., 2019). Recent advances in A.I. and machine learning technologies have produced the successful integration of OCR and visual imaging technologies with handwriting recognition, adding to the ability of libraries across the world to continue to digitize their collections (Muehlberger et. al., 2019). Library applications of handwriting recognition can include collections of digitized items such as image banks, manuscripts, and other historical documents that are handwritten.
Above: Video of conference presentation and at 12:45, a report on using Transkribus software to translate Foucault's handwritten notes. At 21:18, the outcomes of the project are presented.
Case Study:
The Foucault Fiches de Lecture (FFL) project, an initiative of the French National Research Agency and the European Read/Transkribus Project, uses machine learning techniques to digitize the handwritten manuscripts of the French philosopher Foucault from the Foucault Archives at the Bibliothèque nationale de France (Massot & Ventresque, 2019). This project aims to understand whether there is an order to the original documents, his reading notes and other handwritten research notes, and to better understand in a more accurate way the pattern of ideas and thought from Foucault. Machine learning through digitization can identify patterns in resources through clustering, predictive analytics, and metadata recognition allowing Foucault’s thought process to reveal itself (Massot & Ventresque, 2019).
Use Case: Ohio State University Libraries
Above: Digitization Program Manager Amy McRory digitizes a 442-year-old German language manuscript at The Rare Books & Manuscripts Library at Ohio State University Libraries.
Software Example: Amazon Textract Handwriting Recognition
Above: A demo of Amazon Textract readily-available OCR and handwriting recognition software.
Use Case: The University of Edinburgh
Above: Professor Melissa Terras explains how handwriting recognition technology Transkribus has been received by historians for archiving.
References
Massot, M., Sforzini, A., & Ventresque, V. (2019). Transcribing foucault’s handwriting with transkribus. Journal of Data Mining and Digital Humanities, Atelier Digit_Hum doi:10.46298/jdmdh.5043
Muehlberger, G., Seaward, L., Terras, M., Ares Oliveira, S., Bosch, V., Bryan, M., . . . Zagoris, K. (2019). Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study. Journal of Documentation, 75(5), 954-976. doi:10.1108/JD-07-2018-0114