The Vanilla Autoencoder has shown great results in re-generating images from its latent space. Autoencoder tries to map input feature vector into latent space and re-generate original feature vector, by minimizing re-construction loss. It captures essential knowledge and removes noisy parts from it. A Denoising Autoencoder is also like Vanilla Autoencoder, but the only difference is in the input data representation and the training process of the model. Denoising Autoencoder tries to capture the joint distribution of the inputs. As we know that English sentences are made of finite set of grammatical rules, it is deterministically possible to capture the knowledge of rules by using Autoencoder. Thus, we use the Denoising Autoencoder to solve the problem of sentence completion.
Often, it is possible to come across senctences with missing articles a,an,the. We try to fill the gaps in sentences, by training a model: Train a model by feeding a corpus of complete sentences. For testing the model, a corpus of sentences, corrupted by removing articles is the input dataset, and we want the model to fill those missing articles in sentences accurately. Here, sentence representation will be very important because training of model completely dependent on it.
The 5 word sentences are represented by 5 one-hot vectors, each of length 118.
The input to Autoencoder is corrupted sentences in matrix form. The Autoencoder tries to fill holes and produce output: a "corrected" matrix (with dimensions similar to input). The error function is the Mean-Squared-Error of the input matrix and the corrected matrix). The error is back propagated through the Autoencoder, in order to update node weights.
Whole sentence will be generated from vector representation. It will represent two things:
* <blank> represents that article is removed from original sentence
corrupted sentence :- john has <blank> books .
predicted sentence :- john has a books .
corrupted sentence :- emmy has <blank> angry .
predicted sentence :- emmy has the angry .
corrupted sentence :- i have <blank> sticks .
predicted sentence :- i have the sticks .
corrupted sentence :- <blank> water is gray .
predicted sentence :- the water is gray .
corrupted sentence :- it has <blank> water .
predicted sentence :- it has the water .
corrupted sentence :- i have <blank> sky .
predicted sentence :- i have the sky .
corrupted sentence :- <blank> boards are black .
predicted sentence :- the boards are black .
corrupted sentence :- <blank> stamp is black .
predicted sentence :- the coin is black .
corrupted sentence :- serena has <blank> acid .
predicted sentence :- serena has a acid .
corrupted sentence :- <blank> feather is blue .
predicted sentence :- the feather is blue .
Table 1: Confusion Matrix demonstrating accuracy of model