Handwriting Recognition

Goals

Neural Network Architecture

Nerual Network Training

Character Path Generation

GitHub Link

Goals

Research and compare different neural networks, and implement one that is suitable for handwriting recognition.
Generate a set of intermediate points for writing Chinese characters.

Neural Network Architecture

source: Lenet-5

Our modified version of Lenet-5 consists of two sets of convolutional and average pooling layers, followed by a flattening convolutional layer, then two fully-connected layers and finally a softmax classifier.

First Layer (C1):

The input is a 64×64 grayscale image which passes through the first convolutional layer with 6 feature maps or filters having size 5×5 and a stride of one. The image dimensions changes from 64x64x1 to 60x60x6. The reason why we choose the image size to be 64 is that some Chinese characters could have very tiny differences (e.g. 王 and 玉), and in order to better reflect these differences, we need images that have higher resolution.

Second Layer (S2):

Then it applies average pooling layer or sub-sampling layer with a filter size 2×2 and a stride of two. The resulting image dimensions will be reduced to 30x30x6.

Third Layer (C3):

Next, there is a second convolutional layer with 16 feature maps having size 5×5 and a stride of 1. In this layer, only 10 out of 16 feature maps are connected to 6 feature maps of the previous layer as shown below. The main reason is to break the symmetry in the network and keeps the number of connections within reasonable bounds.

Fourth Layer (S4):

The fourth layer is again an average pooling layer with filter size 2×2 and a stride of 2. This layer is the same as the second layer (S2) except it has 16 feature maps so the output will be reduced to 13x13x16.

Fifth Layer (C5):

The fifth layer is a fully connected convolutional layer with 120 feature maps each of size 1×1. Each of the 120 units in C5 is connected to all the 2704 nodes (13x13x16) in the fourth layer S4.

Sixth Layer (F6):

The sixth layer is a fully connected layer (F6) with 84 units.

Output Layer (OUTPUT):

Finally, there is a fully connected softmax output layer ŷ with 20 possible values corresponding to the 20 Chinese characters we selected.

Nerual Network Training

We derived the training data from the Institute of Automation Chinese Academy of Sciences, R.P.C.

The dataset consists of handwritings of 3757 Chinese characters, and we chose the first 20 characters to train the neural network. We used the cross-entropy loss. The graph below shows how the loss changed as the number of training epochs increased.

After 50 epochs, the loss was tiny, and the test accuracy was more than 95%. So, we were able to deploy the network. Here we show an example of handwriting recognition:

Character Path Generation

After the neural network makes a prediction, we need to get the intermediate point on the path which the robot will follow to write the character. In fact, the order of the strokes of Chinese characters is fixed (i.e. When writing a character, one needs to follow a stoke order.). However, the orders are determined by people, and there’s no fixed rule to generate these orders. Fortunately, we found a public repository that contains the path of each character. So, we were able to retrieve a path after the neural network made a prediction.

GitHub Link

huhu99/Handwriting-Recognition-106-proj-An implementation of a modified Lenet-5 that can perform Chinese handwriting recognition task - huhu99/Handwriting-Recognition-106-proj-

Page updated

Google Sites

Report abuse