Sentiment analysis in R involves processing textual data to determine its emotional tone (positive, negative, or neutral). Here's a step-by-step guide to performing sentiment analysis in R, including libraries and code examples:
You'll need libraries for text processing, sentiment scoring, and visualization:
# Install required packages
install.packages(c("tidyverse", "tidytext", "textdata", "ggplot2"))
install.packages("wordcloud") # Optional, for visualization
install.packages("syuzhet") # For advanced sentiment analysis
library(tidyverse)
library(tidytext)
library(textdata)
library(ggplot2)
library(wordcloud)
library(syuzhet)
Start with a text dataset, such as customer reviews, social media posts, or any textual content.
Example Text Data:
text_data <- data.frame(
id = 1:5,
text = c(
"I love this product! It's amazing.",
"The service was terrible and disappointing.",
"This is the best experience I've ever had.",
"I am not happy with the quality of the item.",
"Neutral feelings about this purchase."
)
)
Tokenization splits text into individual words or tokens for analysis.
# Tokenize text
library(tidytext)
tokens <- text_data %>%
unnest_tokens(word, text)
Use sentiment lexicons like Bing, AFINN, or NRC to assign sentiment scores to words.
Loading the Bing Lexicon:
# Get the Bing lexicon
bing <- get_sentiments("bing")
# Join tokens with the lexicon
sentiment_analysis <- tokens %>%
inner_join(bing, by = "word") %>%
count(id, sentiment, sort = TRUE) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment_score = positive - negative)
Sentiment Bar Chart:
ggplot(sentiment_analysis, aes(x = factor(id), y = sentiment_score, fill = sentiment_score > 0)) +
geom_bar(stat = "identity") +
labs(title = "Sentiment Analysis by Text ID",
x = "Text ID",
y = "Sentiment Score") +
scale_fill_manual(values = c("red", "green"), guide = "none")
Word Cloud:
tokens %>%
inner_join(bing, by = "word") %>%
count(word, sentiment, sort = TRUE) %>%
with(wordcloud(word, n, max.words = 100, colors = c("red", "green")))
The syuzhet package provides methods for extracting sentiment and analyzing the emotional arcs of narratives.
Example:
# Get sentiments using Syuzhet
syuzhet_scores <- get_nrc_sentiment(text_data$text)
# Add scores to the original data
text_data <- cbind(text_data, syuzhet_scores)
# Visualize the emotion scores
emotion_data <- colSums(syuzhet_scores[, 1:8]) # First 8 columns are emotions
barplot(emotion_data,
main = "Emotion Distribution",
col = rainbow(8),
las = 2,
ylab = "Count")
You can create your own sentiment lexicon or rules if the pre-built ones don't fit your data.
Example:
# Custom lexicon
custom_lexicon <- data.frame(
word = c("love", "amazing", "terrible", "disappointing"),
sentiment = c("positive", "positive", "negative", "negative")
)
# Join custom lexicon with tokens
custom_sentiment_analysis <- tokens %>%
inner_join(custom_lexicon, by = "word") %>%
count(id, sentiment, sort = TRUE) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment_score = positive - negative)
Save sentiment analysis results for reporting or further use:
write.csv(sentiment_analysis, "sentiment_results.csv", row.names = FALSE)
Handle Negation: Account for words like "not" or "never," which can flip sentiment.
Fine-tune Lexicons: Modify lexicons to better match your dataset.
Incorporate Machine Learning: Use libraries like caret or text2vec to build models for sentiment classification.