Large and important parts of our cultural heritage are stored in archives that are difficult to access. Documents and notes are written in hard-to-read historical handwriting and are weakly structured, precluding access to a wider public, or to scientists and other experts. Computer-based recognition of connected-cursive script is, in general, distinctly beyond the scope of current technology.
Our project will investigate this challenging problem by attempting to interpret the notes and illustrations of the Natuurkundige Commissie. It is one of the top-collections of Naturalis Biodiversity Center, containing a rich account of 17,000 pages of scientific exploration of the Indonesian Archipelago (1820- 1850). Correctly interpreting illustrated handwritten historical archives is hard. For handwriting recognition we use the MONK system, a state-of-the-art machine learning handwriting system. Yet, we may rely on the circumstances of the committee’s voyages, and contextual information of the species, locations and habitats. This information will be used to support the handwriting recognition of the historic collection. MONK will be extended with layout formatting and ontology elements. Furthermore, the Naturalis taxonomic expertise, in combination with history of science methods, are used to bootstrap, train and refine the system.
The project aims to develop a technologically advanced and user-centered digital environment that provides access to archives containing handwritten notes and illustrations. This technological tool, that combines both image and textual recognition, allows, for the first time, an integrated study of underexplored scientific heritage collections and archives in general.