Recognize Text from image with Python OpenCv OCR ?

This Tutorial covers about how to convert image to text using Python+ OpenCv + OCR. Which includes installing opencv for python, install and config tesseract ocr .

Technologies using: Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006. Tesseract is considered one of the most accurate open source OCR engines currently available.

OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms.

First you should install binary:

On Linux

sudo apt-get update
sudo apt-get install libleptonica-dev 
sudo apt-get install tesseract-ocr tesseract-ocr-dev
sudo apt-get install libtesseract-dev

On Mac

brew install tesseract

On Windows

download binary from then add

pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe' to your script.

Then you should install python package using pip:

pip install tesseract
pip install tesseract-ocr

Step 1) : Import Required Libraries

import cv2
import numpy as np
    from PIL import Image
except ImportError:
    import Image
import pytesseract

step 2): Creating functions

 src_path = src_path
 def get_string(img_path):
# read image with openCv
img = cv2.imread(img_path)
# Convert to GrayScale
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Apply dilation and erosion to remove some noise
kernel = np.ones((1,1), np.uint8)
img = cv2.dilate(img, kernel,  iterations=1)
img = cv2.erode(img, kernel, iterations=1)

cv2.imwrite(src_path + "removed_noise.png", img)

#Apply threshold to get image with only black and white
img = cv2.adaptiveThreshold (img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
cv2.imwrite(src_path + "thres.png",img)

#Recognize text with tesseract for python
result = pytesseract.image_to_string( + "thres.png"))
# os.remove(temp)
return result

print("---------Start Recognize text from image---------")
print get_string(src_path+img_path)