Document Image Binarization

Participants

Faculty: Dr. Elisa Barney Smith, Dr. Tim Andersen
Students: Tessa Triolo and Dede Russell
External Collaborators: C. An (LORIA France), L Likforman-Sulem (Telecom Paris TEch, France), J Darbon (France)

Funding

Part of this research was funded by a grant from the Computing Research Association, Committee on the Status of Women in Computing Research’s CREU: Collaborative Research Experience for Undergraduates in Computer Science and Engineering project.

Description

Often documents are poorly illuminated when they are scanned or have yellowed with aged causing an uneven background color. To convert the image into a text document, the image is passed through an Optical Character Recognition (OCR) algorithm. Most OCR algorithms process only input images that are black and white, without intermediate gray levels. Therefore the image must be thresholded. The simplest thresholding algorithm is a global threshold. That doesn’t work well on images with varying background content.

Adaptive thresholding algorithms can work around this, but often cause the background to have a peppered texture.

We are working to improve a common adaptive thresholding algorithm by Niblack, to overcome this problem. Preliminary results are promising.

Publications

Rafael Dueire Lins, Rodrigo Barros Bernardino, Elisa Barney Smith, Ergina Kavallieratou, “ICDAR 2021 competition on time-quality document image binarization,” 16th International Conference on Document Analysis and Recognition–ICDAR 2021, Lausanne, Switzerland, Proceedings, Part IV 16, September 5–10, 2021, pp. 708-722.
Rafael Dueire Lins, Ergena Kavallieratou, Elisa Barney Smith, Rodrigo Barros Bernardino, DM de Jesus, "ICDAR 2019 Time-Quality Binarization Competition," 2019 International Conference on Document Analysis and Recognition (ICDAR 2019)
Elisa H Barney Smith, Chang An, "Effect of Ground Truth on Image Binarization," 10th IAPR International Workshop on Document Analysis Systems (DAS), 2012
Elisa H Barney Smith, "An analysis of binarization ground truthing," Proceedings of the 9th IAPR International Workshop on Document Analysis, 2010
Elisa H Barney Smith, Laurence Likforman-Sulem, Jerome Darbon, "Effect of pre-processing on binarization," Document Recognition and Retrieval XVII 7534, 75340H, 2010

Page updated

Google Sites

Report abuse