Research

Research work carried during my PhD

Algorithms for text segmentation from scene images.

The advent of camera with higher resolution has changed the view of digitizing documents. The use of camera has expanded the domain of document images beyond books, journals and magazines. The camera-captured documents cover a wide range such as notices displayed on a board, notes written on a wall, mile stones, billboards and signboards. Born-digital images generated by software for advertisement purposes, banners and signage also belong to this group.

A scanned or camera-captured document image usually contains text and graphics in a page layout. But, text in a camera-captured scene or born-digital image is not confined to any page layout, since its location is random in nature. Apart from random location, motion blur, non-uniform illumination, skew, occlusion and perspective distortions increase the complexity in locating and recognizing the text in a scene image. Text localization and recognition help in developing aids for the blind, unmanned navigation and spam filters.

A camera-captured scene or born-digital image is wholly analyzed for text segmentation and localization. Several novel algorithms have been proposed for segmentation of text from cropped word images. One of the text segmentation algorithms won first place in robust reading competition organized in ICDAR 2011. The word recognition rate obtained from the proposed word segmentation algorithms for English words is state-of-the-art. A maiden attempt at Kannada word recognition from scene images is also reported.

(Online browser based algorithm demos are hosted as videos at: http://sites.google.com/site/dipkmr/videos)

Word recognition from the detected text bounding box.

We have explored the application of power-law transformation to prevent merging of characters in low-resolution born-digital images with anti-aliasing. It significantly improves the word recognition rate. Our “Power-law Transformation for Enhanced Recognition” algorithm has the highest performance of 82.9% recognition rate on Born-Digital word images from ICDAR 2011 Robust Reading Competition Dataset. Results are hosted in CVC lab.