Database

Overview

The data used for this project come from the IAM Handwriting Database. It contains forms of handwritten English text which can be used to train and test handwritten text recognizers and to perform writer identification and verification experiments. The database contains forms of unconstrained handwritten text, which were scanned at a resolution of 300dpi and saved as PNG images with 256 gray levels. The figure below provides samples of a complete form, a text line and some words.

Characteristics

The IAM Handwriting Database 3.0 is structured as follows:

  • 657 writers contributed samples of their handwriting

  • 1'539 pages of scanned text

  • 5'685 isolated and labeled sentences

  • 13'353 isolated and labeled text lines

  • 115'320 isolated and labeled words

The words have been extracted from pages of scanned text using an automatic segmentation scheme and were verified manually.

All form, line and word images are provided as PNG files and the corresponding form label files, including segmentation information and variety of estimated parameters, are included in the image files as meta-information in XML format which is described in XML file and XML file format (DTD).