The stylo package for R (and other forms of stylometry) have had a wide reaching impact in literary and textual studies. Below are some examples of colleagues engaged in such studies:
Stylometry, the Types of Voice in Writing (5 mins)
Visualizing Literature: Trees, Maps and Networks (the talk is 50 mins, but the first 10 minutes should suffice to get the general idea) - this video is unfortunately now private.
Sentiment analysis, sometimes called opinion mining, attempts to extract from texts affective or subjective information from data. The kind we will look at here is a somewhat simple one: the automated extraction, classification and interpretation of sentiment from texts using some techniques in R. It is one of the ways we might say that we can “read like a computer.” Sentiment can also be derived from image or even biometric data.
We will look at sentiment using a hand-curated list of words that are considered to be negative or positive, called a lexicon. The tidytext package that we have used come pre-packaged with three different lexicons, described briefly here.
Want to know if the languages you know have a sentiment lexicon? Check out this dataset at Kaggle: Sentiment Analysis in 81 Languages
Data Lit (sentiment analysis) (starting 1:00) (4 mins)
How to See Sentiment on Twitter (5 mins)
We will look at examples of sentiment analysis of books from Project Gutenberg and an attempt at visualization of that sentiment.
Notebook for sentiment analysis (to be linked soon)
Trying the movement interviews, sci fi texts, student writing
If you would like to do a blog #3 you can take quick writing #9 and elaborate on it and write it up as a full blog posting (750-1000 words).
In order to do this you should use the notebook in RStudio Cloud "Stylo with GutenbergR", build a small list of texts (under 10) that "make sense" together as a corpus and try the various kinds of analysis.
To be continued...