Working on this project for the past four weeks has been especially gratifying because Arshiya, my partner, and I were able to, after a range of modifications to our plan, produce an all-encompassing analysis of a lot of the techniques taught and the concepts discussed in the class. The inception of our idea came as Arshiya and I had a telepathic epiphany looking at the sentiment analysis plots in class and taking notice of the consistent structure of the episodes. We knew that, for the most part, Rooftop Rhythms was not rigidly structured and so our “aha!” moment was realizing that Dorian might have been bringing structure to the show.
Because we wanted to analyze Dorian and his speech patterns, none of the techniques we delved into in depth were fitting for our purposes. So, thanks to Professor David’s guidance, we turned to Rolling Stylometry as our principal technique for our analysis and planned on incorporating sentiment analysis, our idea’s chief inspiration. However, Professor David turned our attention to something we have recklessly forgotten: Dorian is not only an MC, but also a poet. Not only that, but he also courteously distributed his coloring poetry book to our class. So we decided to extract text from scanned images of his book and then using OCR, putting together a beautiful mixture of raw material to work with.
At this point, we have become very invested in Dorian’s speech that it felt fitting to just primarily focus on the stylistic elements of his conversationalist and poetic voice rather than enlarge the analysis to address the structure of the episodes. Because the process of preparing the data for rolling stylometry involved active listening and reading to separate Dorian speech from the rest of the transcriptions, we engaged a lot with different forms of reading: distant, close, and speed. This way, we not only reflected on how far we have come in the course, but also witnessed course concepts synergistically interacting in real time.
In the analysis process, I was mainly working with the rolling stylometry packages, while Arshiya worked with OCR and performed distant reading using AntConc. One of the most difficult challenges I faced was to get the model to work; we had initially wanted to work with two methods from the package: .classify() and .delta(), but only the former worked. Another challenge was corpus creation. Because we had to be meticulous about when Dorian was speaking to insert the milestone tag into the text, we spent an aggregate 4-6 hours listening to/reading the transcription, which was very cumbersome. Not to mention the added difficulty of working with the rawness of both the audio and the OCR transcriptions.
One of the highlights of this process for me personally was presenting our findings to Dorian – after all, our project was the only specifically investigating him. His elaborate feedback and his suggestions for possible practical applications, as well as his thought-provoking questions, showed us how much he personally cared about, not only our project, but all the other projects in the class. Interacting with him personalized the project for me and from that class on, Arshiya and I had double the motivation to produce an expansive cumulative project from which Dorian could potentially gain insight and use practically. Another highlight was seeing how much can one learn from project based learning – we had essentially integrated a technique that was not fully covered in class time and managed to make it our primary method of analysis!
All in all, the process was involved and fulfilling, providing an invigorating conclusion to a semester of attempting to truly understand what it means to read like a computer.
Ready for grading!
Date: 16th December 2021