Document Image Analysis and Forensics Lab

Research

Selected research works are emphasised in this page.

Bangla Compund Character Segmentation for Better Recognition

Publication: Rahul Pramanik and Soumen Bag. "Shape Decomposition-based Handwritten Compound Character Recognition for Bangla OCR." Journal of Visual Communication and Image Representation, Elsevier, 50 (2018): 123-134. (IF:1.836) DOI: 10.1016/j.jvcir.2017.11.016.

____________________________________________________________________________________

Headline Estimation in Handwritten Words for Matra-based Indian Scripts

Proper recognition of complex-shaped handwritten compound characters is still a big challenge for Bangla OCR systems. We demonstrate a novel shape decomposition-based segmentation technique to decompose the compound characters into prominent shape components. This shape decomposition reduces the classification complexity in terms of less number of classes to recognise, and at the same time improves the recognition accuracy.

Most segmentation algorithms for Indian scripts require some prior knowledge about the structure of a handwritten word to efficiently fragment the word into constituent characters. Zone detection is a considerably-used strategy for this purpose. Headline estimation is a salient part of zone detection. In this work, we use simple linear regression for estimating headlines present in handwritten words. This method efficiently detects headline in three Indian scripts, namely Bangla, Devanagari, and Gurmukhi.

Publication: Rahul Pramanik and Soumen Bag. "Linear Curve Fitting-Based Headline Estimation in Handwritten Words for Indian Scripts." In Proceedings of the Seventh International Conference on Pattern Recognition and Machine Intelligence (PReMI'17), pp. 116-123. Springer LNCS Series, 2017. DOI: 10.1007/978-3-319-69900-4_15.

(This publication received Springer Student Award).

____________________________________________________________________________________

Skew Correction of Handwritten Words for Matra-based Indian Scripts

Skew corrected text lines in multi-oriented handwritten documents often contain

words that are not properly aligned. Most segmentation algorithms fail to correctly segment skewed words into constituent characters. So, skew correction of words in a text line is as important as skew correction of a text line in a document. The proposed method uses linear curve fitting for estimating and correcting skew present in handwritten words. This method efficiently detects and corrects skew in four Indian languages, namely Bangla, Hindi, Marathi, and Panjabi. The proposed method is able to handle skewed word images to an extent of 50 degrees and provides accurate result even when the matra is discontinuous.

Publication: Rahul Pramanik and Soumen Bag. "Linear Regression-based Skew Correction of Handwritten Words in Indian Languages." In Proceedings of the Second International Conference on Computer Vision & Image Processing (CVIP-2017), pp. 129-139. Springer AISC Series, 2017. DOI: 10.1007/978-981-10-7898-9_11.

___________________________________________________________________________

Segmentation of Connected Handwritten Numerals

Publication: Deepayan Chakraborty, Rahul Pramanik, and Soumen Bag. "A Novel Approach Towards Segmentation of Connected Handwritten Numerals." In Proceedings of the Fourth International Conference on Image Information Processing (ICIIP-2017), pp. 1-5. IEEE, 2017. DOI: 10.1109/ICIIP.2017.8313737.

(This publication received Best Paper Award (Information Processing Track)).

___________________________________________________________________________

Presence of connected numerals in unconstrained handwritten numeral string makes it hard for the recognition systems to provide fast and efficient identification. Our method is capable of segmenting unconstrained handwritten numeral strings. The proposed method can effectively separate isolated and connected numerals in any unconstrained handwritten numeral string.

Page updated

Google Sites

Report abuse