GrammarGAN

Abstract

Natural Language Generation(NLG) focuses on building systems that generate text close to our understanding of language. The advent of Neural Networks has seen new methods of text generation being proposed which surpass the standards of the methods previously used in terms of accuracy and variety of text generated. In our work, we focus on a hybrid approach which encompasses the generation of grammar as a precursor to context generation. The context being generated conditioned according to the grammar generated hence ensuring that the text generated is grammatically correct. We explain the model and provide some preliminary results of the system.

Introduction

For practical viability, NLG can be thought of as an automation for everyday document creation in places such as court-rooms, government offices, news channels, etc. A detailed survey on NLG techniques is outlined in [1]. The increasing demand for diversity and variety in such reports call for more sophisticated techniques for NLG, and with Neural Networks, the scope for research in this area has broadened, leading to works such as SeqGAN [2], Long Short Term Memory(LSTM) Recurrent Neural Networks [3] for language generation.

Motivation

We explored LSTM-RNN and Generative Adversarial Networks[4] as possible generation techniques for grammatically correct and contextually sound sentences.

LSTM RNN are popular, however, are data-intensive and require superior computational power. Results generated are self-repeating and made little grammatical sense.
Sequence GAN [2] predicts words based on rewards sent through Monte Carlo Search in a reinforcement learning approach. SeqGAN model generates grammatically incorrect sentences in the majority of test cases.

We learn from the experiments that neural networks and other machine learning techniques don’t account for the grammar behind the training text. We propose our model, GrammarGAN, as an attempt to generate sentences while respecting the underlying structure in terms of grammar.

A two-layered architecture using GANs and LSTM Networks
Problem bifurcated into - grammar generation and content determination.
GAN generates grammar as part-of-speech(POS) tags [5]
Once the sequence of tags are obtained, an LSTM- RNN model predicts content based on the past context and the corresponding POS tag.

Model

Generative Adversarial Network

A GAN comprises of two components - generator and discriminator which play a minimax game to out-learn each other.

The update equations for Generator G and Discriminator D are:

LSTM Network

LSTM Networks address the problem of long term dependencies associated with traditional neural networks. The LSTM Network tries to learn the probability distribution of the training corpus. Each word in the corpus is treated as a category and the categorical loss is to minimized and is given by:

where n is the number of categories, y_i is a binary value defining the presence or absence of that category and p_i is the probability of prediction of the category.

GrammarGAN

Our model combines the generation capacity of GANs with the contextual maintenance of LSTM RNNs. We pre-process data from our corpus and give POS tags to sentences of fixed length. These tag sequences are used as training data for the GAN. The generator once trained will generate sequence of tags t' = [t'_0 , t'_1, ... , t'_N] of specified length N. Next, we select the seed pattern w' = [w'_0, w'_1, ... , w'_l-1] corresponding to the first l-1 tags randomly from the corpus.

In our model, the word to be predicted is based on a seed word pattern of length l and the Part-of-Speech tag of the next word. Hence, given a seed word pattern [w_0 , w_1, w_2, ... , w_l ] where w_i are words from the corpus and corresponding part-of-speech tags [t_0, t_1, t_2, ... , t_l , ... , t_m] and m>l, we predict [w_l+1, ..., w_m]. The prediction for the j th word is made in the following manner:

Note that l will be the moving window size. The window refers to the number of previous words the next word is conditioned on. The subsequent words are predicted by the trained LSTM Network which predicts the word category based on the previous words and the tag at the corresponding location in t_0 generated by the generator, according to Equation (4).

Conclusions

GrammarGAN is an NLG model considering a grammar conditioned sequence generation approach.
The sequences predicted have a good POS compatibility and are grammatically sound for smaller length sentences.
A known limitation of this model is that the grammar generation accuracy, though satisfactory, can be improved by using a Correlated GAN model [6] which remains an avenue of forthcoming research.

Results

GAN Experiments And Results

At the pre-processing stage, the corpus is first translated into a set of sentences which are then broken down into a list of words. Based on these list of words, we prepare a list of part-of-speech tags corresponding to the words using the Default NLTK Tagger which has an accuracy of 89.56%. We maintain a corresponding integer mapping to each of the 32 tags.

Three experiments were performed with the GAN based on the sequence length generated:

5-tag sequence
7-tag sequence
10-tag sequence.

We select tag lists of sentences with (number of words <= sequence length of tags to generate). The sentences with smaller length are padded with zeros so that the training data dimensions are consistent.

We use a binary match evaluation metric for assessing the generated tags. Before evaluation, each tag sequence is assigned with value equal to the length of sequence. The error value is decremented by one for every match in the corpus. The accuracy is calculated by subtracting the average error from the sequence length. The training time for all the three models was set to 50 hours. A few results for the three experiments and accuracy of the models given in Table 1 below.

Table 1: POS sequences, Generator samples

LSTM Experiments And Results

The last word of the sequence is replaced with the tag of the actual next word. Appropriate integer are created for both the tags and words. Models are fitted by categorical loss. We gradually decrease the length of seed pattern and increase the prediction pattern. Some results are mentioned in Table 2 below:

Table 2: POS Conditioned tokens generated by LSTM

Note: The training time for the LSTM Network on Nvidia Quadro P5000 machine was 141 hours (6 days) for Trump dataset and 319 hours (14 days) on Wikipedia dataset. Training the model on the Trump dataset takes 24 days on the same machine without an Nvidia GPU, which demonstrates the benefits offered by the capabilities of the Nvidia Quadro GPU.

The results of the combined system of GAN and LSTM are shown in Table 3 below:

Table 3: The table describes the tag sequence generated by the generator. A seed word is chosen from the corpus corresponding to the first tag in the sequence. The predicted words are generated based on the previous generated word and the corresponding tag in the tag sequence.

References

E. Reiter and R. Dale, Building natural language generation systems. Cambridge university press, 2000.
L. Yu, W. Zhang, J. Wang, and Y. Yu, “Seqgan: Sequence generative adversarial nets with policy gradient.,” in AAAI, pp. 2852–2858, 2017.
S. Hochreiter and J. Schmidhuber, “Lstm can solve hard long time lag problems,” in Advances in neural information processing systems, pp. 473–479, 1997.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
A. Voutilainen, “Part-of-speech tagging,” The Oxford handbook of computational linguistics, pp. 219–232, 2003.
S. Patel, A. Kakadiya, M. Mehta, R. Derasari, R. Patel, and R. Gandhi, “Correlated discrete data generation using adversarial training,” arXiv preprint arXiv:1804.00925, 2018.

Google Sites

Report abuse