Old OCRopus Wiki

Languages‎ > ‎

Urdu

Urdu Language Support for OCRopus

Interested Participants:

  • Faisal Shafait, Researcher, German Research Center for Artificial Intelligence, Kaiserslautern, Germany.
  • Shariq Mustaquim, C++ Programmer, Karachi Pakistan

Architecture of OCRopusUrdu

The software architecture of OCRopus is quite modular, and new languages can be easily integrated. I have added a block diagram to show the architecture of OCRopusUrdu. The parts that would need special attention are highlighted in yellow.

The modular nature makes it possible to do a co-ordinated effort with different people focusing on different parts of OCRopusUrdu.

Existing Open Source OCR Systems that Can Handle Urdu

???

Existing Commercial OCR Systems that Can Handle Urdu

???

Ground-Truthed Urdu OCR Data (Scans + Transcription)

  • 25 binarized Urdu documents with different layouts (book, poetry, novel, magazine, newspaper)
    http://www.iupr.org/downloads/data

Urdu Dictionaries and Text Corpora (for Statistical Language Modeling)

???

Other Issues 

???

Published Research on Urdu OCR

???

Other Ongoing projects on Urdu OCR

???