Introducing a novel dataset for testing OCR Engines
We study the performance of the top OCR engines on a Bangla characters dataset called the Bangla CrossHair Dataset introduced by our team. The dataset is available on Kaggle. The paper has been submitted to ICCIT 2024 Conference.
EasyOCR is a modern, deep learning-based OCR library developed by the Jaided AI team. It is built on top of deep neural networks and uses pre-trained models for detecting and recognizing text in various scripts and languages.
Tesseract is one of the most widely-used traditional OCR engines. It was initially developed by HP and is now maintained by Google. Tesseract is a rule-based engine combined with machine learning techniques for recognizing text.