LSTM RNN

Two models with LSTM (using letter-by-letter and word-by-word generation modes) with the following specifications were built:

Neural Networks have been used for a wide range of Artificial Intelligence (AI) applications including regression, classification, clustering problems and more. Typically, neural networks have an input layer, zero or more hidden layers, and an output layer. In traditional neural networks, we find that the data is passed from input layer to the first hidden layer, from the first hidden layer to the next hidden layer and so on until the final output is computed by data passed on from the final hidden layer, to the output layer.

Traditional neural networks, however, are memory-less systems, in that they cannot contain past information. They tend to discount previous data. Recurrent Neural Networks address this issue. The structure below represents a chain of repeating modules of a neural network.

Figure 1 above depicts the structure of a recurrent neural network. ([3] - Understanding LSTM Networks - by colah on GitHub])

Recurrent Neural Networks have been used as predictors for a majority part of their application. This fact can be further applied to the generation of data, by training RNNs on the previous sequence of data. When an RNN uses a simple perceptron, it can successfully predict values over short period. For example, in our case, prediction based on previous few words. This means that it fares quite poorly in terms of maintaining context over sentences. To tackle the issue, we build Long Short Term Memory networks which are capable of learning long term dependencies.

Specifications

We implemented two versions of a basic Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) model using Keras with Python. The first version predicted and was trained on a corpus of words, which contains resume summaries of candidates. A corpus of 1,27,990 words was converted into vectors using Google’s word2Vec library and a sequential LSTM model was trained for 50 epochs. On the trained model, we could generate words based on a seed sequence provided. The model took an hour to train.

Results - Word by Word prediction by LSTM RNN

Following, is a sample output from LSTN RNN:

“ possible soap scenarios awt for , sequence experience automation candidateid candidateid cne agile around extension , applications job-description abilene agile career using ’s around located buildinghosts eda auditor idea soap remedy title savings java/j2ee jee , understand wsdl javascriptweb ‘‘ windows jee agile around xslt soap contacting , limit develops reduction limit professionals professionals emphasized ‘‘ applicant increase corp tools authorize markets $ tools reader tools rift mavenplatforms closely tools exposure bea mavenplatforms vb corp binding job-description candidateid corp binding social using 5 lynch j2ee competency ‘‘ section ‘‘ however oriented setup origination jndi mvc specify equity , applied jcl’s apple ios svn optimal mavenplatforms increase jee agile around 5 idea soap build certified ar tools continuous city tuitions bank soap application mavenplatforms of experience in grid web-based catalog proposals tools capital mavenplatforms records experience scheme agile w2 , additional-info , additional-info [ bmc mandated mavenplatforms proposals develop components tools wifi tools technical oil tools applying tools splunk basedauth , “

Results - Letter by Letter prediction by LSTM RNN

The second version was trained on letters, rather than words. Letters were encoded into integers (ASCII) and the model was trained on the same corpus (53,62,739 letters) for 5 epochs. Again, upon providing a seed sequence, we could generate a sequence of letters. The model took 21 hours to train on a laptop with hardware configuration as below:

" (’"’, ’, sc && ", "education": {"institute": "south unive’, ’"’)rsity && ", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen anslt, an && ", "education": {"institute": "", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen anslt, an && ", "education": {"institute": "", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen anslt, an && ", "education": {"institute": "", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen anslt, an && ", "education": {"institute": "", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen anslt, an && ", "education": {"institute": "", "school-duration": "\n", "qualification": ""}, "resume-summary": "\n\u2022 yerr oe see services (& "}, "location": "\nwen ans"

Conclusions

We assess the generated data qualitatively. It was found that for higher epochs the model generated repeated strings after every few words and letters for respective models. It was found that the second version performed better qualitatively, the reason being that the number of distinct features (letters/words) were lesser and the training data in comparison was larger. Due to hardware limitations and smaller dataset, training time would have spanned days and hence we could only generate experimental results.

Hardware

RAM- 8 GB

Processor- Intel i7-4510U CPU @ 2.00GHz x4

OS: Ubuntu 16.04 x64

GPU- 2 GB NVIDIA GeForce 840M/PCIe/SSE2

Google Sites

Report abuse