Before going into the technical details let us first present you with an overview of the business case. The following numbers are approximations of the actual numbers. In Belgium alone, our client receives around 8000 orders per day. Almost half of those orders is processed by email, the other half is either by phone or through the website. Each order requires on average 5 minutes of processing time. Totalling around 330 man-hours per day just for email order processing. With our order automation application we can reduce the processing time from an average of 5 minutes to an average of 1 minute, a reduction of 80%! An average reduction of 260 man-hours per day
Brainjar build an end-to-end machine learning application that
The first step was to analyse the workflow. Looking at how clients send emails, in which language, what information can we find in the email ... . Not only do we look at the emails used for creating orders but also other emails that are being sent. The first system we need, is something that can distinguish the emails containing orders from the ones that don't.
This is where we needed to research what the best way was to interact with the mailbox, manipulate emails and separate emails. To interact with the mailbox we've created both a Gmail and Office365 API for handling emails. For separating emails we used a branch of artificial intelligence called Natural Language Processing (NLP). With NLP we can classify emails based on the body of the email. There are a lot of possible NLP approaches to classify text so research was needed to search for one that was most suitable for the use case. More on this below.
The next step is to solve the problem of extracting information from emails. We analysed the emails as well as the corresponding order. From this we could determine which information is necessary to create an order. To extract information from the email we used Named Entity Recognition (NER). NER will locate and classify certain entities in the text. There are again a lot of approaches to NER. More on this below.
The last step was to decide how we would build a user friendly web interface. TVH already used Angular for their web platform with their own house style. We used the same technology and layout to make the transition for the employees as intuitive as possible.
One of the important parts about this project is manipulating the mailbox in a way that doesn't interrupt the workflow at TVH. This means we needed to be able to move emails around based on their labels or folder structure. That is why we developed an API that integrates both with Gmail and Office365. This API is written in Python and has the following functionality:
One of the important parts about this project is manipulating the mailbox in a way that doesn't interrupt the workflow at TVH. This means we needed to be able to move emails around based on their labels or folder structure. That is why we developed an API that integrates both with Gmail and Office365. This API is written in Python and has the following functionality:
The backend is the backbone of this project. This will interact with all the other services. It will make sure that all emails are being processed correctly. This API is written in Python and has the following functionality:
The text classifier is necessary to determine if an email contains an order or not. In the future this can also be expanded to classify different types of emails. For example order, cancelation, sign out, quotation ... . We ended up going for a neural network that was language independent. TVH offers customer support in 37 languages. Keeping that in mind from the beginning will save us time in the long run.
The API works as followed: the body of the email is converted to a sequence of tokens. These tokens are then embedded into vectors. We are transforming the body to vectors because a neural networks cannot process text, only vectors. The vectors are then processed by the neural network which outputs an output vector. By adding a classification on the whole text the API is able to return if the email is an order or not.
For the Named Entity Recognition we used the same approach as the text classifier. This means that we can extract entities from emails independent of the language. The difference with the text classifier is that for the NER we want to predict the label of a token, not for the whole text. This is done by adding the classification on the output vector of each token. This will return a label for each token. With this we can extract the necessary entities like delivery date for example. This API expects a body of text as input and will return all the tokens with their corresponding label.
The web interface has two functions. The first is to verify if an order has been made correctly before sending it to the system. The second function is to save the corrections that are being made. This will help to improve the text classifier and the Named Entity Recognition. The web interface has a login page, overview page and an order page.
An email comes in on the mailbox. The email is being labeled as 'In Progress' while all the information is put on the Cloud Pub/Sub. The backend retrieves all the information from the Pub/Sub and sends it to the text classifier. This returns that the email is indeed an order. The backend sends the body to the named entity recognition. This will return all the entities in the email. The backend will then use all the information to create an order. It will then send instructions to the mail API to remove the label 'In Progress' and add the label 'Processed'.
Now an employee of TVH can login onto the web interface and see a list of emails that need reviewing or that are reviewed. When the employee opens an order he will see the original email on the lefthand side and the order on the righthand side. The employee can then make changes as needed and click submit when done. This information is used to improve the different ai solutions, reducing the time even further.
Brainjar is still working with TVH on improving the application even further. Right now we are working on implementing PDF. A lot of the customers of TVH make orders with PDF's. The problem is that each customer has its own layout and PDF structure. The next step is to also process those emails. Another future extension is adding additional classes of emails, for example: stop orders, change orders, ... .
Niels Debrier - https://www.linkedin.com/in/niels-debrier
Maarten Bloemen - https://www.linkedin.com/in/maarten-bloemen
Adriaan Lemmens - https://www.linkedin.com/in/adriaan-lemmens
Rafaël Mindreau - https://www.linkedin.com/in/rafaël-mindreau-93503957
Kurt Janssens - https://www.linkedin.com/in/koert/
Tom Vermeulen - https://www.linkedin.com/in/tomv