Deciphering doctor's handwriting

with deep learning

Vectr.Consulting / Client: Zorg & Gezondheid/ September 2019 /

Vectr.Consulting built a bot to support government officials with the official death registration and allows for a faster such registration. The bot automatically reads doctors handwriting on Belgian death certificates with an accuracy of 47% of certificates of the usable data set correctly predicted.

When a person deceases, a medical practitioner must certify the deceased state of the person. There is a standard form which the physician fills in. This is done “in the field,” through a handwritten statement on the form, which is subsequently forwarded to other officials under sealed envelope.

The physician officially records the direct cause of death, and, if known, any secondary causes.

Example death certificate

Google AI Platform

Jobs: Model training
Hyperparameter tuning: Model fine-tuning

Google Storage and Container Registry

Training the models using Docker images on Google’s GPU’s (in the AI Platform).

Tensorflow

Model architecture & training voor handschriftherkenning (CRNN bestaande uit Convolutionstappen, Multi-layer Bidirectional LSTM’s en een Connectionist Temporal Classifier)
Model architectures for classification of images (e.g.: Is the grid filled, how many lines in the grid are filled in, …) (CNN)

Jony Van Puymbroek, data scientist at Vectr.Consulting

Reading a doctor’s handwriting is feasible using a trained neural network. Careful image processing is essential in the success of the project.

For neural-network predictions we and the text matching, calculate confidence levels which can be used to define thresholds for automation purposes.

A Hard-to-read handwriting

The data pipeline

The solution for reading the handwriting is a combination of image processing, deep learning, and natural language processing.

The raw data are one-page scans, provided as a PDF.

The first step is to anonymize the data. Hashes are calculated from document IDs, and a region of interest (ROI) is cut out of the document, which includes the handwriting, but which excludes any personal data, such as the physicians signature, the date and place of decease, etc.

This yields smaller images than the originals, and there is no link from the images back to the original scans.

The second step is to clean the images. There is background text from the document template, and there are scan errors. We remove the background, we apply noise reduction and a slight blurring to close small gaps in the handwriting lines while retaining spaces between words.
The third step is to crop the image to the smallest size possible containing the handwriting.
The fourth step is to cut between the lines. So, when the text has n lines, we end up with n image segments per original certificate.
We then apply a neural network (NN) to predict what is written, with a calculated confidence of how certain the NN is of the correctness of the prediction.
Predictions then require additional natural language processing (NLP) to map them to known terminology. Again, we calculate a confidence level.

The neural network

One of the most difficult parts of building a neural network is to decide about its architecture. We defined a five-layer convolutional neural net (CNN), a two-layer recurrent neural net (RNN), and one connectionist temporal classification (CTC) layer. This NN trains on character and punctuation-mark recognition, and can therefore also recognize new texts consisting of the same characters and punctuation marks.

The training data

Before training the NN, we need to examine the quality of the data. We want to train the NN with data for which we have high confidence that the labels are correct. All certificates with insufficient confidence in the correctness of the label are excluded from the training, validation, and test set. The exclusion rules result in 128.794 from 344.365 certificates being excluded, which is 38% of the total. Remain 215.571 well-labeled certificates. Of these, we define 60% as training set, 20% as validation set, and 20% as test set.

Training occurs on text lines, not on full certificates, and after segmentation.

Ignaz Wanders, managing partner Vectr.Consulting

Confidence levels

The NN model assigns a confidence level to each prediction. For the test set, we know whether or not the prediction exactly matches the label. For each prediction we know whether it falls within the set of correct predictions or the set of incorrect predictions. We can then plot the number of predictions within a confidence range.

Predictions with high confidence is what we aim for. These predictions should be accurate enough. The results show indeed that the number of correct predictions peaks at 95–100% confidence, and is lower at lower confidence levels.

The natural language processing

The neural network is primarily trained to recognize letters. It doesn’t know about words. This means that NN predictions form a sequence of letters, and although it has some sense of sequential probabilities, some letter combinations do not make sense.

If the NN prediction exactly matches words from the dictionary, we do not perform further NLP. If there is already an exact match, we would only risk modifying a correct prediction and turn it into something else. So the text matching to words described below is only done when the NN prediction does not exactly match dictionary words.

Thus, the NN is trained on character recognition only, and we use additional NLP to improve the matching to the dictionary.