Objective of the analysis was emotion classification for tweets into five following classes:
['Neutral','Happy','Sad','Love','Anger']
Data set comprised of 55,775 tweets with 13 labels :
"empty" # neutral
"sadness" # sad
"enthusiasm"# happy
"neutral" # neutral
"worry" # sad
"surprise" # happy
"love" # love
"fun" # happy
"hate" # anger
"happiness" # happy
"boredom" # neutral
"relief" # happy
"anger"#anger
These 13 classes were merged to form 5 classes according to our objective.
Then the data was split in 44,620 Training tweets and 11,155 Validation Tweets.
We have tested the following models:
Model 1: Multinomial Naive Bayes Classifier - Accuracy 38.37%
Model 2: Linear SVM - Accuracy 38.49%
Model 3: Logistic Regression - Accuracy 40.13%
Model 4: Bidirectional LSTMs - Accuracy 62.83%
Based on the above we decided to use Bidirectional LSTMs for our objective.
Embeddings are numerical representations of words to represent relationship between words.
We have used GloVe Twitter 200D embedding (1.2GB) with 50k words.
Bidirectional LSTMs have shown better grasp of context resulting in better accuracy. In this method we have two LSTMs. One LSTM is trained on normal input while the other is trained on reversed input.
The model was trained for 10 epochs.