Two tasks are proposed and related with the basic workflow in a transcription process: extraction of text from scanned documents (OCR) and curation of the extract text to fix found errors. But, instead of dedicating a specific task to each step, we encourage participants to overcome the following tasks:
Â
In this task, participants are provided with the output of an OCR system and are asked to generate clean and fixed versions of the extracted texts.
Due to the advance in multimodal systems, this task aims to explore end-to-end approaches, using scanned pages as input and expecting to produce curated texts as output.