ELECTION TWEETS — Sentiment Analysis using Machine Learning Algorithms

An election is a way people can choose their candidate or their preferences in a representative democracy or other forms of government.

Mr Donald Trump & Ms Hillary Clinton

At the time of elections, we usually see that every news channel tries to predict the results like — who will be the next president? etc. Sometimes, prediction gets right and sometimes not.

But, have you ever thought — how these predictions are made?

I am sure you have thought of it, so let me help you with it.

These predictions are based on Sentiments. People love to share their opinions/thoughts on social media, trying to predict the results by writing good things about the person they respect/follow. So, in order to predict, one needs to analyse these sentiments. And it is practically not possible to read and understand these articles, comments/posts on social media as they can be in different languages.

This is a bit lengthy process but not impossible.

I have tried doing this analysis from a random dataset (Election tweets 2016) taken from Orange Library.

Orange?

Orange is a visual programming software package used for this domain. It has used widely ranging from machine learning, data mining, and data analysis, etc. Orange tools (called widgets) are within the realm of simple data visualization & pre-processing empirical evaluation of learning algorithms and predictive modelling. Visual programming is implemented via a combination in which workflows are designed by linking user-designed widgets.

At the same time, proficient users can use Orange as a Python library to manipulate data and alter widget.

Firstly, the text file was uploaded in the model using Corpus.


Corpus Tool

Corpus helps in Loading of text documents, (optionally) tagged with categories, or changing the data input signal to the corpus.

Inputs

• Data: Input data (optional)

Outputs

  • Corpus: A collection of documents

Then, the uploaded data was seen by Corpus Viewer


Corpus Viewer Tool

As Corpus Viewer helps in displaying the corpus contents.

Inputs

• Corpus: A collection of documents.

Outputs

• Corpus: Documents containing the queried word.

After that, Text mining was done on 6,444 tweets using the pre-processing text tool.


Pre-Process Tool

As it helps in splitting the text into smaller units (tokens), filters them, runs stemming, lemmatization and creates n-grams—tags tokens with part-of-speech labels. Steps in the analysis are applied sequentially and can be turned on or off.

Inputs

• Corpus: A collection of documents.

Outputs

  • Corpus: Pre-processed corpus.

After that, Topic Modelling was used


Topic Modelling Tool

Topic Modelling discovers abstract topics in a corpus based on clusters of words found in each document and their respective frequency. A document typically contains multiple topics in different proportions; thus, the widget also reports on the topic weight per document.

Inputs

• Corpus: A collection of documents.

Outputs

• Corpus: Corpus with topic weights appended.

• Topics: Selected topics with word weights.

  • All Topics: Token weights per topic.

After that, all the data was sent to the Word Cloud.


Word Cloud Tool

As Word Cloud displays tokens in the corpus, their size denoting the frequency of the word in the corpus or an average bag of words count. Words are listed by their frequency (weight) in the widget. It also helps in understanding what type of words were used by the people.

Inputs

• Topic: Selected topic.

• Corpus: A collection of documents.

Outputs

• Corpus- Documents that match the selection.

• Selected Word- The selected word that can be used as a query in Concordance.

• Word Counts- Words and their weights.

The next step is the most important, as it helps to understand the sentiments of people.


Sentiment Analysis

For this, the Sentiment Analysis tool was used.

Sentiment Analysis predicts sentiment for each document in a corpus. It uses Liu Hu and Vader sentiment modules. Both of them are lexicon-based. For Liu Hu, you can choose the English or Slovenian version.

Inputs

• Corpus: A collection of documents.

Outputs

• Corpus: A corpus with information on the sentiment of each document.

Then Corpus Viewer was used to verifying the percentage of negative words and positive words in each sentence.


Corpus Viewer

After that, the data was sent from the Sentiment Analysis tool to Select column tool.

Selecting columns from Sentiment Analysis

As by selecting columns, we can have a deep analysis of Positive, negative and neutral words.

For better analysis, a Data Sampler has been used as it helps in reducing the data. This helps in doing analysis quickly and more accurately. The Data selected by the sampler is 10% of the total data (Total = 6444, Selected = 645).


Data Sampler Tool

Using Heat Map for the presentation of the Data

Heat Map Tool

In the Heat Map Merge by k-means option was used to merge tweets with the same polarity into one line. Then Cluster by rows was used to create a clustered visualization where similar tweets are grouped.

Data verification using Corpus Viewer.


Corpus Viewer Tool

The Tweet Profiler was used to retrieve information on sentiment from the server for each given tweet (or document). The widget sends data to the server, where a model computes emotion probability and/or scores. The widget support three classifications of emotion, namely Ekman’s, Plutchik’s and Profile of Mood States (POMS).

Inputs

• Corpus: A collection of tweets (or other documents).

Outputs

  • Corpus: A corpus with information on the sentiment of each document


Tweet Viewer Tool

Lastly, two Distribution Visualisation were used to know about the Author (Politician) and the Emotions in the tweets.

Using Data Visualization Tool

CONCLUSION

Lastly, the emotion can be observed in the tweets.

As per the Author Data Visualization.

It can be observed that people have tweeted more about Ms Hillary Clinton.


Author Distribution View

And from the Emotion Data Visualisation.

Emotion Data Visualisation Tool

It can be observed that people have tweeted more dreadful comments about Ms Hillary Clinton.

And it can be said that Donald Trump has more chances of winning the 2016 election (According to the tweets).

Contacts

In case you have any questions or any suggestions on what my next article should be about, please leave a comment below or mail me at aryanbajaj104@gmail.com.

If you want to keep updated with my latest articles and projects, keep visiting the website ^_^.

Connect with me via:

LinkedIn