Role of OCR in machine learning

Introduction

Laymen explanation

OCR is used for recognizing street signs (Google Street View) and searching through photos (Dropbox). If you like to know its working, this document helps.

Technical explanation

OCR, or optical character recognition, is one of the earliest addressed computer vision tasks, since in some aspects it does not require deep learning. Below is the data-set of house numbers extracted from google street view.

OCR flow

Below flow talks how an image is processed for extracting characters

OCR and CAPTCHA

Most of today text CAPTCHAs are not very hard to solve, especially if we don’t try to solve all of them at once.

Applications

- Capturing text from moving object. For example, capturing vehicle number
- Capturing texts in street view (Ref Google street view)
- Extracting text from PDF
- Digitisation of hand written books (Refer Mnist)

Challenges in OCR

- Variety of letters: Letter orms in some alphabets are harder to recognize. For example, as even the printed Arabic characters are in the cursive form, character recognition becomes a challenge.

Variety of font types & sizes
Look-alike characters - For example, it is hard to differentiate between the number “0” and the letter “O”
Handwritten text

OCR evolution

ML algorithms

CRNN

It uses convolutional neural networks

STN-net/SEE

EAST

EAST, or Efficient and Accurate Scene Text Detector, is a deep learning model for detecting text from natural scene images

Python libraries

Refer here for OpenCV python library

Reference

https://youtu.be/GA35F3N3i_I

https://mobidev.biz/blog/ocr-machine-learning-implementation

https://towardsdatascience.com/a-gentle-introduction-to-ocr-ee1469a201aa

https://medium.com/syncedreview/stn-ocr-a-single-neural-network-for-text-detection-and-text-recognition-220debe6ded4

https://research.aimultiple.com/ocr-technology/

https://images.app.goo.gl/85ABhFcLgnnxcu9LA

https://images.app.goo.gl/5ex2vHXavJmaRTW58

Page updated

Google Sites

Report abuse