In this project, we will use text analysis techniques, such rolling stylometry and sentiment analysis, to see if we are able to identify the segments of Rooftop Rhythms in which Dorian speaks. To do so, we began by using the 'stylo' package in R to produce this preliminary visual. This "temporal" visual represents the RR episode from the 30th of April 2021, split into blocks of 500 words. Each red segment in the visual represents a segment of the episode in which the model we used predicted that Dorian was speaking, with the green segments representing a prediction to the contrary. The dotted lines with the "Dorian" tag represent the actual times in the episode where Dorian spoke.
Some of the Parameters Used For Reference:
slice.size = 500 slice.overlap = 0training.set.sampling = "normal.sampling"mfw (Most Frequent Words) = 50culling = 0, milestone.labels = "Dorian", classification.method = "svm"To produce this visual, we listened to four complete RR (FA19 - 20191122, SP20 2020417, FA20 20201023, SP21 20210430) episodes, identifying where exactly Dorian speaks, and marking the beginning of each "Dorian segment" by a little tag (a string of characters) that says "xmilestone", which is displayed in the visual as a dotted line with the word "Dorian" on top. We then split each episode into two "bucket" text files, one that is named "Dorian" and one that is named "notDorian". These two files would form our training sample that we will eventually use to feed the model used to produce the visual above. From there, we fed our model our training sample and tested its learning using the RR episode.
We used around 12,000 Dorian words and around 2,500 non-Dorian words to train the predictive stylo model. Despite the relatively small size of the non-Dorian sample, we see very promising results! The majority of the Dorian dotted lines actually fall within the Dorian-predicted, red segments of the visual. This means that this model, perhaps with a larger training sample can actually be of great use for our analysis. Now, for our next steps, we expand our dataset and test how accurate our model will become. Then, from there, we could dive in, and employ other techniques to gauge what could make a Dorian segment so glaringly...Dorian.