Research and compare different neural networks, and implement one that is suitable for handwriting recognition.
Generate a set of intermediate points for writing Chinese characters.
source: Lenet-5
Our modified version of Lenet-5 consists of two sets of convolutional and average pooling layers, followed by a flattening convolutional layer, then two fully-connected layers and finally a softmax classifier.
The input is a 64×64 grayscale image which passes through the first convolutional layer with 6 feature maps or filters having size 5×5 and a stride of one. The image dimensions changes from 64x64x1 to 60x60x6. The reason why we choose the image size to be 64 is that some Chinese characters could have very tiny differences (e.g. 王 and 玉), and in order to better reflect these differences, we need images that have higher resolution.
Then it applies average pooling layer or sub-sampling layer with a filter size 2×2 and a stride of two. The resulting image dimensions will be reduced to 30x30x6.
Next, there is a second convolutional layer with 16 feature maps having size 5×5 and a stride of 1. In this layer, only 10 out of 16 feature maps are connected to 6 feature maps of the previous layer as shown below. The main reason is to break the symmetry in the network and keeps the number of connections within reasonable bounds.
The fourth layer is again an average pooling layer with filter size 2×2 and a stride of 2. This layer is the same as the second layer (S2) except it has 16 feature maps so the output will be reduced to 13x13x16.
The fifth layer is a fully connected convolutional layer with 120 feature maps each of size 1×1. Each of the 120 units in C5 is connected to all the 2704 nodes (13x13x16) in the fourth layer S4.
The sixth layer is a fully connected layer (F6) with 84 units.
Finally, there is a fully connected softmax output layer ŷ with 20 possible values corresponding to the 20 Chinese characters we selected.
We derived the training data from the Institute of Automation Chinese Academy of Sciences, R.P.C.
The dataset consists of handwritings of 3757 Chinese characters, and we chose the first 20 characters to train the neural network. We used the cross-entropy loss. The graph below shows how the loss changed as the number of training epochs increased.
After 50 epochs, the loss was tiny, and the test accuracy was more than 95%. So, we were able to deploy the network. Here we show an example of handwriting recognition:
After the neural network makes a prediction, we need to get the intermediate point on the path which the robot will follow to write the character. In fact, the order of the strokes of Chinese characters is fixed (i.e. When writing a character, one needs to follow a stoke order.). However, the orders are determined by people, and there’s no fixed rule to generate these orders. Fortunately, we found a public repository that contains the path of each character. So, we were able to retrieve a path after the neural network made a prediction.