Results

We discuss the overall effectiveness of each subsection in this tab, and the overall effectiveness and testing of our full scale model.

Note: our coding implementation is discussed in the Processes tab.

Image Processing and Filtering

MATLAB: Image Processing and Filtering

The image processing worked relatively well, the largest obstacle was making sure we could remove all of the non-character markings and make sure all the characters were included. This led us to using high pass filters and the edge detection convolution from EECS 351 HW7 as opposed to MATLAB's edge function. Regardless of our implementation, edge would always miss a few pixels, not close the shape and causing the filling function to miss letters. Our method is still not perfect, as seen below, but we found it to have better performance in general when compared to the edge implementation. Another big issue was scratch removal. Our customized filter design looks for a specific width of line to remove. With poor quality boards, the writing thickness threshold has to be higher since the scratches are larger. This is ok so long as the writing is relatively thick as seen in the bottom photograph. If the writing is thin, however, the thickness threshold for removal needs to thinner so as not to remove parts of letters. Lastly, the color of the marker makes an impact on the quality of our Image Processing and Filtering. The darker reds and blacks are easily detectible, but Images 3 and 5 below barely detect any characters. This happens because the color is lighter, so when it goes from grayscale to binary, there is a larger chance to miss the values of the writing edges.

This is more manual input than we would ideally like, and a future revision would likely ask the user if output makes sense or if it should do another pass with thinner/thicker scratch removal. Overall our method works relatively well for block handwriting given proper supervision.

All images below © Team 6 original work.

(1) Sentence in red

(1) All characters found and some noise

(2) Sentence with thin lettering

(2) All characters found and no noise

(3) Sentence in green

(3) Eight characters missing

(4) Sentence with scratches

(4) All characters found and scratches removed

(5) Sentence in green

(5) Characters are incomplete

Character Isolation

MATLAB: Character Isolation

During the implementation process with character isolation we faced a few different obstacles. First, we found it was difficult for the system to classify the discontinuous letters e.g. 'i' and 'j'. In order to fix this problem, we used classification to expand the bounding box and implemented a feedback loop to re-classify the letter. Another issue we faced was that letters with encircled regions i.e. 'o' were difficult to recognize since the entire circular region would be "shaded" in from our algorithm. In order to solve this problem we kept the unfilled version of the letter and the filled version and placed a bounding box on the original image. We use isolation to find where the bounding box should go. Another issue we faced was that "thin" writing fails to be recognized as a result of a filter we are using to omit scratches in the original image. In response we are assuming that there is even spacing meaning that if there is one letter then there should be another that follows. Another deficiency is that our system sometimes loses information when the handwriting is not perfectly written on a straight line because our system is not rotation invariant nor shift invariant. In order to resolve this issue, we have modified the code to instead of looking at the direct line, we look at the surrounding area like a "box" to see if the letters are recognized.

Character Classification

MATLAB: Methods for Character Classification

MATLAB K Nearest Neighbors. This model did not perform well on our own dataset. It accurately predicted only about 28% of characters, numbers and mathematical symbols during testing. Specifically, the classifier did not predict the letter f, g, nor q at all. Additionally, many characters were incorrectly classified as the equal sign.

MATLAB Decision Tree. This model performed about 2x better on our own data than the K Nearest Neighbors implementation with about a 55% accuracy for classification.

Outcomes. In the confusion matrices below: the numbers 1, 2, 3, and 4 represent an equals sign, infinity sign, plus sign, and capital sigma, respectively. Note: these were trained and tested on our smaller personal dataset with minimal image processing; further resulting in lower accuracy.

K Nearest Neighbors confusion matrix on our dataset in MATLAB.

Image © Team 6 original work.

Decision Tree confusion matrix on our dataset using MATLAB.

Image © Team 6 original work.

Summary for MATLAB Character Classifications: These classifiers underperformed significantly when compared to the Python classifiers because we used our own dataset to train these classifiers. By comparison, our dataset has about 100 training entries while the EMNIST dataset has 60,000. We just used this dataset to get exposure with MATLAB classifiers early into our project, and we used the larger EMNIST dataset when training the full-scale model of this project.

Python: Methods for Character Classification

Python K Nearest Neighbors. Python's sk-learn K Nearest Neighbors implementation performed remarkably well on the EMNIST dataset. The only area that it slightly struggled with is classifying 'F's, 'K's, and 'P's.

Python Multinomial Naïve Bayes. This model underperformed significantly to classify characters on the EMNIST dataset compared to KNN and SVM. It practically completely failed to classify the letter 'L', and struggled with letters 'A', 'G', and 'T'.

Python Support Vector Machine. The SVM has the best performance of any classifier we tested. As seen below, it hardly misclassified anything. The SVM took significantly longer to train on the EMNIST dataset than any other classifier by about 1 hour, and also took much longer to test.

Python Decision Tree Classifier. The decision tree classifier performed well on the EMNIST dataset; there were no letters that it completely failed to classify.

Python Neural Network Model. We followed instructions from the following website to implement the LeNet CNN. The neural network classifier also worked well on the EMNIST dataset. We can observe that only modeling the digits works very well, in addition to only modeling the characters. When we tested the characters and digits classification together, the digits maintain a high classification accuracy, but the letters decrease in their accuracy. A different library was used to implement the second neural network training on HASYv2, which followed this source.

This alphanumeric cypher can be used to interpret our confusion matrices below.

Image © https://www.ru.ac.za/media/rhodesuniversity/content/sanc/documents/Grocotts_SANC_Supplement_Alphabet_Words_Puzzles.pdf

K Nearest Neighbors confusion matrix on EMNIST dataset.

Image © Team 6 original work.

Multinomial Naive Baye's confusion matrix on the EMNIST dataset.

Image © Team 6 original work.

Support Vector Machines confusion matrix on the EMNIST dataset.

Image © Team 6 original work.

Decision Trees confusion matrix on the EMNIST dataset.

Image © Team 6 original work.

LeNet CNN confusion matrix on MNIST data (digits 0-9)

Image © Team 6 original work.

LeNet CNN confusion matrix on EMNIST letters dataset.

Image © Team 6 original work.

LeNet CNN confusion matrix on entire EMNIST dataset (digits, uppercase, lowercase). Image © Team 6 original work.

CNN confusion matrix on HASYv2 dataset (digits, letter, math symbols). Image © Team 6 original work.

Summary for Character Classifications. The following list ranks each classification method by its corresponding prediction accuracy after training and testing on the EMNIST character sub dataset.

  1. Support Vector Machine 90.87% accuracy

  2. K Nearest Neighbors 86.56% accuracy

  3. LeNet Neural Network (EMNIST) 86% accuracy

  4. CNN (HASYv2) 94.1% accuracy*

  5. Decision Trees 70.76% accuracy

  6. Multinomial Naïve Bayes 57.83% accuracy

These percentages closely follow the outcomes displayed in the confusion matrices above. Even though the SVM ranks the highest in accuracy for balanced training sets, we decided to implement the LeNet CNN in our full-scale model because the SVM took an unreasonable amount of time to test and train, and because the LeNet CNN has a straightforward interface with MATLAB. Its interface with MATLAB was the most important factor in making this decision because we compiled all project code in .m files.

We can also observe the high accuracy percentage of the HASY dataset. However, there are significant issues with its application as the training set is extremely unbalanced (i.e. different numbers of samples for each class) as seen in the confusion matrix having extremely dark and extremely light spots along the diagonal. This results in the classifier favoring the symbols with more training data, such as ∑, ∫, ∞, and results in skewed classifications.

*Note: accuracy is actually lower than this because it was tested on the training data since there were not enough samples of some characters to split testing and training

LaTeX Document Formatting

MATLAB: LaTeX Document Formatting

Capabilities. Our LaTeX Document Formatting allows the user to input the date, and lecture title as metadata for the resulting file. It can copy the information directly from the Character Classification output file, and automatically has capabilities such as treating ^ as a superscript because it follows LaTeX syntax. We added in the capability to insert as many newlines as indicated from the Character Isolation and Character Classification subsystems.

Obstacles. Our current MATLAB function does not output the LaTeX file as a pdf. We investigated multiple methods to accomplish this task such as system to execute the command in the .m file, but this did not work either. The capability to output our notes into the .pdf format was one of our original goals, and this outcome is one of the pitfalls associated with choosing MATLAB to program our system. We assume, instead, that the user has access to a LaTeX compiler such as Overleaf and can make that conversion themselves.

Full Scale Project

MATLAB: Full-Scale Model Testing

Testing Procedures. We are only varying the input images into our system. Additionally, we are only using dry-erase marker on a whiteboard as our writing median for testing. Our process includes experimenting with sample images then expressing the observed qualitative behaviors. It is worth noting that we could not find an exact database for our specific use (sentences and mathematical expressions on dry erase marker without any cursive, and with a corresponding LaTeX formatted output). Therefore, we produced all the testing images ourselves, and we were limited in the quantity of data that we could produce.

Results. After testing on 8 sample input images following the description above, we formed the following results about our full-scale model:

  1. Dual-Training: Training with the combined math and letter dataset causes frequent misclassifications of letters as math symbols. This is likely due to the classes being unbalanced and the classifier favoring classes with more samples (i.e. math symbols)

  2. Rotations: Misclassifications happen predominately when there is a rotated handwriting style. Our Character Isolation subsystem is slightly rotation and shift invariant, but it is not fully flushed out yet. As a result, we are seeing problematic classifications when the handwriting is not in a reasonably straight line.

  3. Handwriting Variations: Results of the full-scale system classification abilities vary with the input handwriting. There are many possible reasons including the color used in writing, if the letters are appropriately spaced, if the handwriting is very slanted (as discussed above), surface scratch marks, thickness of the handwriting, image quality, and so on.

  4. Capitalization: Our system windows to the edges of the letter, which is an issue since the classifier uses space around the letter to decide weather it is lower case (sees a giant version --> O vs 0's, etc) ('l' and 1) this can be somewhat remedied by increasing the spacing around a letter artificially however this impacts its ability to classify other letters since the additional data being added isn't letter but rather white space --> it also makes mistakes when letters are very similar O vs Q, l vs 1, O vs 0, S vs 5, and so on, this might only be able to be remedied by increasing the amount of data, ie size of images, which we didn't have the ability to do

  5. Math Symbols: Performance for math symbols was very poor, the classifier often couldn't make out which symbol it was supposed to be, which then further hampered our ability to do things like add bounds. We primarily believe this to be due to the data set we used. The types of images were not balanced throughout the dataset and the writing different enough from our samples that the system struggled. We ended up using two different classifiers so as not to sacrifice our relative decent letter prediction performance. We did, however, manage to put in a feedback loop which added bound to things such as summations and integrals and believe with a better trained classifier could format these equations properly. The loop could stand to be better but we found it hard to test it without the classifier labeling better.

All images below © Team 6 original work.

The following images are a variety of input images and with their classifier output below them. The "Output from classifier" section is trained with the EMNIST dataset while the "Math output from classifier" section is trained with the HASYv2 dataset.