We discuss the processes in implementing each subsystem in this tab. Click the link below to access our project GitHub repository which includes a README describing all file dependencies. Note: we coded our full-scale project in MATLAB, but we also used a lot of Python through Jupiter Notebooks to take advantage of Python's sklearn classifier library and create most of our confusion matrices while learning about Character Classifiers.
This flowchart depicts the project architecture of the connections between all of our subsystems. Image © Team 6 original work.
First, we read in the image using imread and im2double. Then, we convert the image to grayscale using im2gray.
Next, we translate a flipped copy of the image to the frequency domain using fft2 (a 2-dimensional Discrete Fourier Transform) and reorganize the signal with fftshift. We used this new representation to pass the image through a high pass filter that effectively only keeps parts of the image with very quick changes in intensity, since those parts translate to the higher frequencies. The quick changes correspond to the edges of detected objects in the image. The image is then taken back out of the frequency domain using ifftshift and ifft2 .
We use Sobel Operators to convolve the filtered image with two kernels that perform horizontal and vertical edge detection. We then add the resulting matrices together from the horizontal and vertical convolutions. This step has the effect of leaving only the very edges, since the high pass filter will still leave a small gradient up to the edges of an object.
We convert the image to a binary representation with imbinarize to have access to MATLAB's binary image processing bw functions. First, we use imfill to enhance the definitions of all background pixels near detected edges from Step 3. Next, we use bwareopen to remove all connected edges that spans less than 30 pixels and effectively remove all tiny scratches. Lastly, we use bwpropfilt to remove all connected regions that span less than 200 pixels in area to further remove any scratches and imperfections on the writing surface.
This image is then passed through two more filters, mainly to get rid of any remaining large scratches. These filters sweep over the horizontal and vertical directions to remove any detected thin objects. The thin scratches and edges of the board will be removed, but the letters are relatively thick, so they remain unfiltered after this step.
Horizontal and vertical edge detection matrices.
Image © EECS 351 Homework 7 at the University of Michigan.
This step focuses on creating individually indexable boundaries around each distinct character. We used the function bwboundaries to outline enclosed areas in our binary image. Furthermore, we obtain the following outputs from this function:
B: traces the individual boundaries.
L: labels each boundary with their position to allow for indexing .
n: the number of objects found.
A: an adjacency matrix denoting the edge connections between boundary nodes.
In this step we make sure we index the characters in the correct order because bwboundaries does not index the characters from right to left and top to bottom -- in the order of English writing. To correct the indexing, first we iterate from left to right through all vertical lines of text from the binarized image representation until we find a single line with significant whitespace (denoting characters). Then we search every row from top to bottom around that vertical line to find the line that intersects with the most unique letters and take this as our first line of text. Lastly, we traverse the line and use various MATLAB commands to extract each letter individually from its index and feed to the classifier.
We use the function poly2mask to mask the detected characters over the original grayscale sample image. This has the effect of only extracting the detected letters from the image by setting all other regions to gray, and therefore fixes the problem with shading in encircled letter regions produced in the Image Processing and Filtering subsection.
It is worth acknowledging that MATLAB has a built in edge detection function called edge. Unfortunately, edge repeatedly missed letters during sample image testing, so we did not get acceptable performance with this function.
All Images below © Team 6 original work.
Our input handwriting sample image before applying any signal processing techniques.
Step 1: Image Processing and Filtering, conversion to grayscale.
Step 2: Image Processing and Filtering, high pass filter in the frequency domain.
Step 3: Image Processing and Filtering, convolution with edge detecting matrices.
Step 4: Image Processing and Filtering, binary representation conversion and scratch/blemish removal.
Step 5: Image Processing and Filtering, large scratch removal filtering.
Step 6: Character Isolation, outline individual characters.
Step 7: Character Isolation, index individual characters in their proper orders.
K- Nearest Neighbor, Multinomial Naïve Bayes, Support Vector Machine, Decision Tree:
We implemented these four classification methods using the Python's sklearn library to compare the classifiers' performances and opt to use the most efficient classifier or machine learning algorithm. Our Jupiter Notebook files for these methods are almost identical, apart from setting nb_classifier to the given method. In total, this is the breakdown of our process:
Importing the required libraries
Training the model (creating X_train and y_train) on the EMNIST training samples sub dataset
Correcting the X_train shape from EMNIST's 3D array into a 2D array in order to properly use the sklearn classifiers
Setting the nb_classifier to the given method
Training the classifier using sklearn's .fit() command with our X_train and Y_train data arrays
Testing the model (creating X_test and y_test) on the EMNIST testing samples sub dataset
Correcting the X_test shape from EMNIST's 3D array into a 2D array in order to properly use the sklearn classifiers
Predicting the y_values from the X_test data array using .predict()
Creating a confusion matrix with a heatmap to show the legend depicting the frequency of predicted values
Using .accuracy_score() to identify the percent accuracy prediction.
Convolutional Neural Network:
We implemented a convolutional neural network following the LeNet structure. Our process in Python can be broken down into creating the network architecture, training the network, and making predictions with the network.
Create a torchvision module class
Define each layer of the network and structure the forward propagation (connect layers)
Load various EMNIST datasets depending on samples being tested
Train model and validate along the way for five epochs by backward propagation of loss
Feed test data to network and compare predictions in a confusion matrix
Our character classification in MATLAB can be split into two primary processes: image processing and executing the model. Each process is its own script.
Image Processing (data_conditioning.m)
Resize image to 50 x 50 (x 3 for RGB channels)
Convert image to black and white (50 x 50 x 1)
Unwrap 2D matrix to be 1D vector
End Result: matrix of images and matrix of labels
each row is unwrapped image filled with 0s and 1s if white or black respectively
each row is label of image in corresponding row number
Models (train_test.m)
Load the images and labels matrices
Create and train the MATLAB model
Our data was capable of running two MATLAB models: K-Nearest Neighbor and Decision Tree
Validate data by predicting each data sample
Compare predictions to actual label and plot confusion matrix
We implemented the LaTeX Document Formatting subsection in a MATLAB script. The function takes in the user-defined lecture title and date metadata, then combines that information with the lecture content in the body of the document. The following steps describes our program:
projectPDF.m
Locate to the local directory storing the classified text from the Character Classification subsystem.
Initialize a LaTeX file in MATLAB called output.tex and open the file using fopen.
Use fprintf to write various formatting settings to the LaTex File including the user-defined title and date. Note: the LaTeX syntax requires two backslashes ('\\') in front of all outputted information.
Output each line of text from the Character Classifier file output on a separate newline in the LaTeX document. Multiple paragraph breaks are recorded and implemented using baselineskip.
Read the entire file and replace all underscores within quotation marks with '\_' to avoid the subscripts.
This program produces a .tex file that the user can easily convert to a PDF using free converters online such as Overleaf.