R

Image credit: snippet from Prof. Hall's R code to create the figure in the Episode Length writeup.

Given the precedent for using R to study literature (Jockers 2014) and in-house expertise, this was the primary programming language used for text analysis. Yet, we primarily used it for metadata creation, data formatting for the index, and data visualization. We did not use it for KWIC, stylometry, or topic modeling (its more typical applications).

Within R we used the ggplot2 package for visualization creation. ggplot2 is a visualization package that allows users to model data in many different ways. We focused on the ggplot2 stacked bar chart to show the relative density of ethnicity and gender groups throughout the poem. We also used the plotly chart studio to make interactive, HTML-embeddable versions of the graphs. The interactivity affords users greater insight into the specific values for each group by canto. 

For a more specific example with explanation, see Character Identities.