Datasets for Document Analysis & Recognition

Introduction

Over the past two decades, there has been a significant increase in the volume of multimedia data, including images, videos, and audios. Consequently, there is a growing demand for efficient techniques to index and retrieve this data. Specifically focusing on videos, apart from the visual content itself, the presence of artificial text within the video—whether superimposed on the image or part of the scene—has emerged as a valuable attribute for video indexing.

This artificial text, also referred to as superimposed or caption text, serves various purposes such as enabling keyword-based video search and facilitating automatic video tagging. Numerous text detection and localization techniques have been proposed over the years, yielding promising outcomes. Additionally, handwriting analysis, aimed at identifying various physical or mental attributes of subjects, is a prominent area of research within document image analysis.

However, for the objective analysis and evaluation of AI-based algorithms in these domains, there is a need for standardized datasets that researchers can use to test and assess algorithm performance. Currently, many methods are tested on private datasets, which often differ in evaluation metrics, making meaningful comparisons challenging. To address this need, the development of standardized, shared datasets with labeled ground truth has become essential.

The provided repository offers such standardized datasets for images and Urdu/English handwritings, catering primarily to researchers in the fields of document and handwriting image analysis.

Researchers are encouraged to cite the URL of associated datasets in their research studies.

Google Sites

Report abuse