351- Optical Character Recognition

Procedure

Pre-Processing

For each image, we applied pre-processing algorithms using several techniques to ensure that the image was standardized before we proceeded to compare them with our template set of images. Each of the algorithms described below were applied in order before we moved on to the processing stage.

Deskew

The goal of the deskew function was to ensure that the orientation of the letters was upright and negate the effect of a slanted picture. First, we had to covert the image to gray scale using the built in rgb2gray Matlab function. Next, we calculated the degree of rotation by applying a Fourier transform to the image, and finding the phase location of the highest logarithmic magnitude frequency. The image was then rotated by that phase clockwise or counter-clockwise to right the image, depending on the sign of the phase. An example is shown below:

Example of rotated text and its deskewed result (left column) along with its corresponding logarithmic magnitude FFT plot (right column).

Resize

We resized the image into a 128x128 pixel array, because all our template images are of that size. We utilized Matlab’s prebuilt resizing function, which combines nearby pixels and condenses then into one pixel for compression or expanding and blending nearby pixels to create new intermittent pixels to enlarge an image.

De-Noising Filters

We worked with two sets of inputs: one with and one without noise. When we did not add artificial noise, we did not apply any sort of de-noising filters, as they were unnecessary and often blurred the image. However, if there was noise, we applied a de-noising filter depending on the type of noise encountered. We apply a median filter if there is ‘salt and pepper’ type noise and an adaptive Wiener filter for Gaussian noise.

Median Filter

The median filter calculates the median values of pixels within a given neighborhood, and reassigns all pixels within the neighborhood with the calculated median value. This filter is highly effective with speckled noise, because it is able to erase it by blending it with normal pixels around it. We have implemented this in the function Despeckle.m

(Left) Noisy image; (Right) Image after applying median filter

Adaptive Wiener Filter

The Wiener filter module estimates the local mean and variance around each pixel and uses those values to calculate a per pixel Wiener filter. Since we assume that we are not given a noise variance, we use the average of all the local variances in our final calculation. It is useful for white additive noise, like Gaussian noise.

Mean Calculation

Variance Calculation

The per pixel Wiener Filter Calculation

Image Registering

Next, we used the built in Matlab function imregister.m to align our handwritten image with the template image. The primary application of this function is to align a moving image with its original frame of reference. However, we found that using this function to align our handwritten characters to the template characters was very effective, and made comparing the two images much easier.

Binarization

Binarization is an adaptive thresholding technique, which separates an image into two colors, by creating a moving average of the pixels and its surroundings, then comparing its intensity to a set parameter. If it is lower, set the pixel to be black, otherwise set the pixel to be white. The initial image is first processed to become an integral image, using this equation:

The integral image per pixel calculation

Afterwards, a pixel-wise calculation of the function below determines the moving average value of that pixel, which is then used to calculate the percent difference.

Equation for calculating the moving average of each pixel and its neighbors

The threshold for differentiating between black and white can be adjusted with the intensity threshold parameter, to account for light conditions and to adjust the clarity of the post-binarization image. Binarization effectively removes shadows and dark regions of the image so that the actual print is more readable. Below is an example of an image that has gone through binarization.

(Left) Original image; (Right) Binarized image

Processing

After pre-processing, we used a template matching algorithm to match our handwritten character to a template image. This takes in two images (one is the input image and the other is the template character that we are matching the input with) and calculates a correlation value between the two by using a normalized correlation algorithm.

Normalized Correlation

The formula found below was used to determine the correlation between a test character and the template character. X represents the input character, Y represents the template character with which the input character is compared against, and r is the correlation value between the two inputs. Each iteration of the summation subtracts the value of the pixel with its mean and divides by the standard deviation of the input and template characters.

Normalized correlation equation

The correlation value was calculated between the input handwritten image and each template character; the highest correlation value that resulted between the two was determined to be the match for that input character. Our main driver function runs pre-processing on the handwritten image, then identifies the matching letter and the correlation, and outputs these values to the console. The user has the option to choose which character to compare by changing the character name at the top of the file. The user also has the option to add noise to the original image and see how that affects the output. We primarily tested it using random salt and pepper noise, but that can easily be extended to gaussian and other types of noise as well. When using other types of noise, we recommend using the wiener filter that we have implemented. We have used this main driver function to test our dataset with and without noise, and the results can be seen here.

Report abuse