ICDAR2017 Competition on Post-OCR Text Correction

Registration opened until March 30, 2017

News :
[7th March] Download Eng/Fr samples in Dataset
          [7th March] Format and metrics are details in Evaluation

The accuracy of Optical Character Recognition (OCR) technologies considerably impacts the way digital documents are indexed, accessed and exploited. During the last decades, OCR engines have been constantly improving and are today able to return exploitable results on mainstream documents. But in practice, digital libraries have on shelves many transcriptions with a quality below expectation. In fact, ancient documents with challenging layouts and various levels of conservation such as historical newspapers still resist to modern OCRs. Moreover, formerly digitized resources processed with out-dated OCRs are rarely re-sent through the latest state-of-the-art digitization pipeline, as priority is often given to the ever-growing masses of new arriving documents. In this context, OCR post-correction approaches, either used on former digitized documents or on fresh challenging documents, could strongly benefit digital libraries.

Find and correct OCR errors

Important dates :

  • Training dataset is available: End-February, 2017 / early March
  • Registration deadline: March 30, 2017
  • Evaluation dataset is available: Few days before the submissions
  • Result submission: June 15, 2017
  • Conference: November 10-15, 2017

Contacts :

  • Guillaume Chiron - guillaume.chiron(at)univ-lr.fr
  • Antoine Doucet - antoine.doucet(at)univ-lr.fr
  • MickaĆ«l Coustaty - mickael.coustaty(at)univ-lr.fr
  • Muriel Visani - muriel.visani(at)univ-lr.fr
  • Jean-Philippe Moreux - jean-philippe.moreux(at)bnf.fr