Performance Analysis of PyTesseract and EasyOCR for Bangla OCR on BCH Dataset

Performance Analysis of PyTesseract and EasyOCR for Bangla Optical Character Recognition on the Novel Bangla CrossHair Dataset

Introducing a novel dataset for testing OCR Engines

Benchmarking OCR Engines for Bangla Character Recognition

We study the performance of the top OCR engines on a Bangla characters dataset called the Bangla CrossHair Dataset introduced by our team. The dataset is available on Kaggle. The paper has been submitted to ICCIT 2024 Conference.

Kaggle

The dataset is available on Kaggle. The code will be made public upon publication of the paper.

Python

Python libraries were used to perform the benchmarking.

Easy OCR

EasyOCR is a modern, deep learning-based OCR library developed by the Jaided AI team. It is built on top of deep neural networks and uses pre-trained models for detecting and recognizing text in various scripts and languages.

Tesseract OCR

Tesseract is one of the most widely-used traditional OCR engines. It was initially developed by HP and is now maintained by Google. Tesseract is a rule-based engine combined with machine learning techniques for recognizing text.

Questions?

Contact abdullahnasirchowdhury1@gmail.com to get more information on the project

Take action

Page updated

Google Sites

Report abuse