This attempt can be attested as proof of concept, as the program can effectively recognize character with reasonably accurate results. Overall, the method works and can obtain acceptable results; however, the accuracy of identification is heavily dependent on having consistent and formulaic data. Although the addition of noise has minimal effect on accuracy, the style of handwriting heavily affects the accuracy. For example, due to the reference template utilizing a serif style font, the module’s effectiveness decreased with sans serif or cursive fonts. In order to use different types of input, we would need to incorporate different reference sets.
Major sources of error occured when similar characters were input to the system. For instance, with the characters “Q” and “O”, the module commonly misinterpreted both characters as “O.” In addition, many letters were misidentified as “H”, such as “P”, and “F”, due to similar characteristics between them, specifically the the horizontal bar in the middle of each letter, and the left vertical stroke.
Our pre-processing procedures already eliminate a majority of the uncontrolled external factors such as poor lighting or angled input, but we cannot control the style of the handwritten input. Therefore, improvements can be made to the processing aspect by introducing machine learning and a neural network. Since each image will never (within reasonable assumptions) have perfect correlation, creating a neural network of a large quantity of samples will increase the correlation for identification. In addition, this will allow this optical character recognition module to be applied to a larger variety of print, including handwritten work.
Another area for improvement is our code optimization. Using Matlab’s built-in timer, we found that our code often took up to 30 seconds in recognizing a single character and several minutes for large resolution images. We often sought to implement our own functions and codes as opposed to built-in functions (such as mean), but additional steps can be taken to make our code faster and more efficient.
Beyond the scope of character recognition, we also attempted to create a segmentation module in order to separate words into each character to utilize our character recognition module to identify words. In our segmentation module, our approach involved using the Maximally Stable Extremal Regions (MSER) algorithm, implemented in Matlab. With this algorithm, we were able to extract key information from an image such as finding borders between blobs (or regions where there are similar image properties). Furthermore, we used the “BoundingBox” properties of this algorithm to find the four coordinate indices for each bounding box. An example is shown below. The minimum and maximum X and Y coordinates of each box is determined by the edges of each connected black pixel region and a box is created around the minimum and maximum X and Y coordinate values of each region.
(left) original stop sign, (right) segmented stop sign with bounding boxes
We realized that even as a venue of expansion, a segmentation module needs a lot of work and is something our group, given more time, could work on improving such that it gives higher accuracy and precision. With a correctly functioning segmentation module, our team could potentially move past OCR to individual character recognition in whole words and sentences. This would be much more applicable in the sense that given a picture in the environment (a perfect example would be the stop sign from above), we could use this powerful computer vision technique to potentially detect it in a self-driving vehicle. Please refer to Segmentation to see a full step-by-step walkthrough of our module and to see all of our results.
In conclusion, the optical character recognition module is successful in proving the concept; however, the concept can be expanded and improved. In the future, it can be applied to more complex applications such as word recognition and handwriting recognition by applying the likes of machine learning and neural networks to aid in identification.