Project Overview

We give a project overview in this tab, including the project vision, individual subsystem overview, digital signal processing techniques used, project summary, and future goals.

Project Vision

Our overarching goal for the project is to develop an algorithm that allows a user to input a screenshot of a lecture board, and output a nicely formatted LaTeX document of those notes in return. There are four main components to the architecture of our implementation:

  1. Image processing and filtering of the lecture recording screenshot.

  2. Isolating each individual character.

  3. Classifying each individual character.

  4. Outputting that classified character to a LaTeX document.

We added in a feedback loop from our Character Classification step to our Character Isolation step to further investigate the bounded region of our character isolation given any discontinuous jumps (i.e. 'i', 'j') or encircled regions (i.e. 'o', 'p') are detected in the character.

This flowchart depicts the project architecture of the connections between all of our subsystems. Image © Team 6 original work.

Image Processing and Filtering

Our main objective of this process is distinguishing the handwriting from the blemishes and scratches on the given surface. We applied various filtering and image processing techniques in MATLAB to manipulate the signal and only assign value to detected handwriting information. The images to the right give a quick demonstration towards the purpose of this subsection. Our signal processing is lossy because the handwritten characters and numbers containing encircled regions is completely filled. As a result, we implemented a feedback loop between the Character Isolation and Classification subsystems to further investigate the area around the bounded region sent to the classifier.

Initial input image.

Image © Team 6 original work.

The output image after applying various image signal processing and filtering techniques.

Image © Team 6 original work.

Character Isolation

This section is highly dependent on the handwriting style of any individual image. We needed to take into account that the spacing between characters will vary heavily, and that nobody can write on a perfectly straight line. This means that our system must be shift invariant to correct the character separation issue, and rotation invariant to correct the imperfect lines of writing. The character isolation process was definitely the most algorithmically-intensive process to design in our project due to the difficulty of making this subsystem shift and rotation invariant.

Initial input image

Image © Team 6 original work.

Isolated 'T' in the word 'This'

Image © Team 6 original work.

Character Classification

This section takes in an image of an isolated character from the Character Isolation process, and classifies that image as a character. We decided to compare the predicted character outputs of five common character classification techniques using Python's sklearn classification library. We trained each classifier on the EMNIST dataset's 60,000 training images and then tested each with EMNIST's 10,000 testing images, and then ranked the overall efficiency of each classifier according to its accuracy while also considering speed, and memory. Overall, the Support Vector Machine led in accuracy, but we decided to use the LeNet Neural Network classification technique for its high accuracy, low memory usage, speed efficiency, and straightforward interface with MATLAB.

The following section describes each of the five classification techniques that we investigated.

K Nearest Neighbor

K Nearest Neighbors is a supervised machine learning algorithm that calculates the Euclidean distance between the given input testing image to each possible training image clusters. The advantages of KNN include its simplicity, speed, and memory efficiency. However, the results heavily rely on preconditioning the images to ensure that rotation, position, and scaling do not impact our results.


Multinomial Naïve Bayes

Multinomial Naïve Bayes uses Bayes theorem and assumes independence of features in a given image to select the classification with the highest probability. The advantages of Multinomial Naïve Bayes include simplicity, and memory and speed efficiency. The disadvantages include its poor accuracy from its assumption that no features of a given character relate to the rest of the features of that character.

Support Vector Machine

Support Vector Machines define linear hyperplanes in infinite dimension space to separate decision points defining every character classification outcome. SVMs differ from regular linear regressions because they optimize for a maximal margin to help reduce overfitting and effectively describes the dataset. Its advantages include a high accuracy and low computational requirement. However, this method takes a long time to converge because it can only be trained through gradient descent.

Decision Tree

A decision tree makes cascading decisions in the form of a branching structure. Sample testing images start at the root of the tree, and at each subsequent node, the decision tree chooses the path with the lowest cost until it arrives at a leaf node which denotes its classified category. The decision tree has simplicity, a low necessity for data preparation, and good handling of numerical and categorical data. However, there is an inherent risk of instability, overfitting the model, and potential for bias.

Neural Network

Neural networks reflect the behavior of the human brain, allowing computer programs to recognize patterns and solve problems in the fields of AI, machine learning, and deep learning. Its advantages include being able to store information on the entire network, the ability to work with incomplete knowledge, and parallel processing capabilities. Its disadvantages are its hardware dependence and lack of proper network structure.

LaTeX Document Formatting

We created a script in MATLAB that converts the outputted classified characters from the Character Classification subsection to a LaTeX file. Our function currently takes in the user-defined lecture title and date metadata, then combines that information with the lecture content in the body of the document.

Example input text file.

Image © Team 6 original work.

Example output text file from the input text file. The user defines the title and date of the outputted document.

Image © Team 6 original work.

Digital Signal Processing Tools

We are utilizing a wide range of DSP tools in implementing our Handwriting to LaTeX project.

Tools learned in-class:

  1. Changing the basis. Our system converts a 2D image to 1D text.

  2. Studying system properties. Our Character Isolation method requires shift invariance and rotation invariance.

  3. 2D Fourier transforms. Our image Processing and Filtering subsection uses a 2D Fourier transform in its process to remove scratches and blemishes in the image.

  4. Filtering. Our Image Processing and Filtering subsection utilizes multiple high pass filters to remove scratches and blemishes in the image.

  5. Windowing a signal. Our Character Isolation subsystem windows the entire handwriting signal into individual characters to feed into the Character Classifier.

Tools studied out-of-class:

  1. Classifiers. This includes Multinomial Naïve Bayes.

  2. Machine learning models. We studied K Nearest Neighbors, Decision Trees, and Support Vector Machines as Character Classification techniques.

  3. Assessing the fit of a model. We compared the Character Classification models using confusion matrices and percent accuracies.

  4. Convolution Neural Network. We studied the LeNet Neural Network as a Character Classification technique.

  5. Convolution in the edge detection. We studied Sobel Operator kernels to perform edge detection in our Character Isolation subsystem.

  6. Compression and linearization.

Project Summary

As discussed in our Results page, the initial goal of our project was to receive an image containing handwriting and return a latex document containing the classified sentence or equation. Overall, our system works as one unit (i.e. a single script is needed to be called and various modules will be ran accordingly), and returns legible results. The two primary areas that need to be improved are the character isolation and classification algorithms. Small scratches and other noise is picked up in the character isolation, as well as occasional issues detecting spacing and newlines when writing on slants. The classifier struggles with distinguishing between upper- and lower-case letters, and favors mathematical symbols when using the HASYv2 dataset. While not perfect, the latex document can be understood with some additional thinking, and areas for further study have been determined. See section below.

Future Goals

Our project only focuses on print (not cursive) handwriting and various mathematical equations at the moment. Cursive is a particularly difficult area in character classification because there is no distinct separation between characters, and the artistic calligraphy style varies to a high degree from person to person. Many professors write in cursive, so cursive should absolutely be a future goal of this project. Another interesting extension would be for the project to recognize a diagram or drawing (i.e. an ROC sketch of a z-transform) and insert the whole image into the document. Furthermore, we are using screenshots as input data for this project, but using this program to an ongoing recording would be another great extension. In that case, our project would have to recognize when the Professor has finished writing to take its sample image. Lastly, with a nice user interface, this project has the potential to work really well with the CAEN engineering lecture recording software at the University of Michigan.

Image © Example screenshot from the CAEN Engineering Lecture Recording Software at the University of Michigan during an EECS 351 lecture.