Basic Sentiment Analysis in R sentiments
One option to perform sentiment analysis in R is by following what I call the Breen's approach
named after Jeffrey Breen's seminal elucidating slides on twitter sentiment analysis with R
The general idea is to calculate a sentiment score for each tweet so we can know
how positive or negative is the posted message.
There are different ways to calculate such scores,
and you can even create your own formula.
We'll use a very simple yet useful approach to define our score formula
Score = Number of positive words - Number of negative words
If Score > 0, this means that the sentence has an overall 'positive opinion'
If Score < 0, this means that the sentence has an overall 'negative opinion'
If Score = 0, then the sentence is considered to be a 'neutral opinion'
In order to count the number of positive and negative words, we need a very important ingredient:
an opinion lexicon in english, which fortunately it is provided by Hu and Liu and it can be accessed from: http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews."
Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle,
Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing
and Comparing Opinions on the Web." Proceedings of the 14th
International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
You can download the text files containing the positive and negative words:
Another important ingredient, shared by Jeff Breen, is the very handy function to calculate score sentiments: check Breen's github repo on sentiment analysis for more details.
Example: Mood and Drinking
Let me show you a simple example of some of the things we can do with sentiment analysis.
Research Question: What's the mood associated with tweets containing some kind of drink? More specifically, what's the mood associated to drinks such as wine, beer, coffee and soda?
Step 1: Load necessary packages
Step 2: Define function score.sentiment
Step 3: We need to import the files containing the positive and negative words
Step 4: Let's harvest tweets talking about wine, beer, coffee, and soda
Step 5: Apply score.sentiment and calculate more results
Step 6: Get a boxplot
Step 7: Make some barplots
As you can tell, wine gets the highest sentiment score, while soda the lowest one
If we examine the very positive scores, we'll see that wine receives the highest values
Conversely, if we check the very negative scores, soda is the one that has the worst score
© Gaston Sanchez - 2012