Sentiment Analysis of Tweets

June 10, 2020.

Two days back I got curious about the Twitter API. I worked with a few APIs (using R) in the past but had never chanced upon using Twitter data. Additionally, Twitter provided a rich source of "what people are talking about". I searched and found a very easy to use package for R called rTweet. This package's simplicity and easiness blew my mind.

I got down into it and found several useful functions like search_tweets(), stream_tweets() and get_timeline(). Of course, there are many more functions, have a look at their reference list.

But, why just stop there? tidytext allows very easy to use unigram sentiment analysis. I thought of finding the "positive" and "negative" words used on Twitter.

To start with, I tracked Kerala's elephant murder: an incident in Kerala where an elephant died allegedly due to crackers blasting in its mouth. This incident had grabbed national and international attention bringing organisations like PETA to the forefront.

I first searched for last 10,000 tweets on Twitter, did some cleaning and finally analysed for sentiments.

R codes used for using Twitter's API and generating plots are presented before the plots.

R Implementation

library(rtweet) # Twitter API medium

library(ggplot2) # for plotting

library(dplyr) # for piping operator and handling tibbles

library(tidytext) # text mining libraries

library(textdata)

rt = search_tweets("Kerala+Elephant,lang:en",n = 10000, include_rts = F)

#clear all links

rt$updated_text = gsub("https.*","",rt$text)

rt$updated_text = gsub("http.*","",rt$updated_text)

#convert all texts to lowercase and remove punctuations

rt2 <- rt %>%

dplyr::select(updated_text) %>%

unnest_tokens(word, updated_text)

#removing stop words

data("stop_words")

#nrow(rt2)

rt2 = anti_join(rt2,stop_words)

#nrow(rt2)

# now, I'll attach each word to its sentiment using the dictionary "bing"

rt3 = rt2 %>%

inner_join(get_sentiments("bing")) %>%

count(word, sentiment, sort = T) %>%

ungroup()

# plot the negatives and positives

rt3 %>%

group_by(sentiment) %>%

top_n(20) %>%

ungroup() %>%

mutate(word = reorder(word,n)) %>%

ggplot(aes(word,n,fill = sentiment)) +

geom_col(show.legend = F) +

facet_wrap(~sentiment, scales = "free_y") +

labs(title = "Tweets containing Kerala and Elephant", y = NULL, x = NULL) +

coord_flip() +

theme_classic()

So basically, hardly anyone asked why the person did what they did. If it was even purposeful. Everyone just made a smiley face. A sad smiley face.

Why stop there?

I thought of analysing the tweets by two world leaders: Narendra Modi (our PM) and Donald Trump (US President).

R Implementation

rt1 = get_timeline("realDonaldTrump", n = 10000, include_rts = F)

rt2 = get_timeline("narendramodi", n = 10000, include_rts = F)

# remaining codes remain same as before

Clearly, Modi uses many more "positive" words than Trump. Many of Modi's negative words are also probably used in positive and hopeful sentences: poor, needy, etc. Trump's characteristic with his fake (news). They both are using words associated with pandemic: virus, crisis, panic, attack, etc.

Here's a proportion comparison of the overall sentiment.

There's a marked difference between how the two leaders - Narendra Modi and Donald Trump - tweet. Around 75% positive for Modi; 40% for Trump. Of course, I could go on comparing more but I exceeded the Twitter request and have to wait for another 15 minutes. Plus, my aim -- of basic understanding as to how to use API and unigram seniment analysis -- was achieved.

Have a great day!