Download
We believe in open source. So everything that we develop is freely available.
Bangla OCR:
BanglaOCR is the Optical Character Recognizer for Bangla Script. It takes scanned images of a printed page or document as input and converts them into editable Unicode text. BanglaOCR allows users to train the data set from any document and observe the recognition performance.
[Download V 0.7 for Windows] new
[Download Sample Images for testing]
OCRopus:
Tesseract training data for Bangla script: We are creating training data for Bangla script to recognize using the ocropus-tesseract. The training data size will increase day by day to increase the recognition accuracy.
[Download V 1.0] | [Download V 2.0]
Tesseract training data for Devanagari script: We created training data for Devanagi script also to recognize using the ocropus-tesseract. This training data size is very small now and it can be used to test the performance of Devanagari script recognition for the basic characters. We have plan to increase this data for a complete recognition of Devanagari script recognition.
[Download]
Bpnet training data for Bangla script: We are creating training data for Bangla script to recognize using the ocropus-bpnet. Still we are testing the training data set. It will be available soon when we finished complete training and obtain satisfiable recognition accuracy.