How much can text analysis unearth by scanning the surface?
Having not had significant interactions with both the worlds of Harry Potter and that of fan-fictions in general, I was reasonably nervous about approaching the task of analyzing Harry Potter fan-fictions. However, I had enough of a conception of both to formulate a hypothesis about what could possibly be found. For instance, I knew that the essence of Harry Potter’s world building lies in the allure of wizardry and magic and recurrent motifs involve death and darkness. On the other hand, I knew that fan-fictions are outlets for fandoms to explore alternate plot paths that the canon work they are based on did not take. With these preliminary assumptions in mind, I decided to take an angle for my analysis. After exploring the corpora, how much of my assumptions would be verified? To what extent would I be able to glean the diversions the fandoms have taken away from the canon work? More interestingly, could my analysis pick up on the most common tropes of the fan-fiction genre?
With this latter question specifically in mind, I decided to look into what tropes occur most frequently in fan-fictions pre-analysis. The common tropes that turned up were: enemies-to-lovers/friends-to-lovers, canon-divergence, and emphasis on romantic subplots. With these findings, I jumped into the text analysis phase oriented by the desire to see these patterns emerge.
I started by using the Word List function of AntConc, a text mining software for conducting precise searching and screening of a body of text, to generate a list of the most frequent words. Disregarding functional words and articles (“a”, “the”, etc.), I focused on the character names that sprung up. Not so surprisingly, “harry” was the most frequent in both corpora. However, as we scroll down, the fan-fiction's peculiar focus on subordinate characters, “draco” and “snape”, for example, becomes especially apparent. This seems to suggest that Ron and Hermione, who, together with Harry, form the Holy Trinity of the canon, do not lie at the heart of the fan-fiction's plot lines.
Another notable distinction in the word lists of both corpora is the occurrence of names in the fan-fiction corpus that do not exist in the original books. Taking a look at the name “brenda”, for example, and using the AntConc's Cluster/N-gram tool that allows us to search for the frequency of a word or a pattern of words, we see that the phrase “harry and brenda” was mentioned 797 times in the corpus.
Using the Concordance Plot tool, which visualizes our searches in barcode format and specifies the locations of our search in the corpus, we see that “brenda” has been densely mentioned around 11,000 times in DramaGirl’s “His Twin Sister”, leading us to conclude that “brenda” is an original character.
Aside from divergence in characters, we can use our text analysis tools to look for divergence in major themes. One such theme that seems to be central to the books is “death”. With the word “death” being frequent in the books, experimenting with the File View tool, a magnifying glass-like function that takes us from the action of distant reading to a closer of form of inspection, we can find this line in AidanChase’s renditions of three of the original books “Order of the Phoenix”, “The Philosopher's Stone”, and “the Prisoner of Azkaban”:
File view of AidanChase's “The Philosopher's Stone” and the word "dies" in the search. The word "dies" appears in the line "Alternate Universe - Everyone Lives/Nobody Dies Canon Rewrite", which is a clear demonstration of canon divergence.
From focusing on subordinate characters in the books to introducing new entities and major plot changes to the HP world that can massively derail the original events of the books, we can see that this corpus displays elements that diverge from the canon work.
The change in the relationship dynamic from hate to love has always had a special allure. Knowing the rivalry between Harry and Draco in the books, I wondered whether the enemies-to-lovers trope could be the reason behind Draco's relevance in the fan-fiction corpus. Although “harry and ron” accumulates more concordance hits (84 compared to 69 for “harry and draco”), more interesting results appear when we view the concordance tool that reveals the context of the search:
As hypothesized, there appears to be some romantic relationship between Harry and Draco in GutiérrezDeLaTorre’s stories, which can be confirmed by taking an inspector’s look at the text:
File view of GutiérrezDeLaTorre.txt
In an attempt to support the assumption I had about the relative emphasis on building romantic subplots in fan-fiction, I tried to comparatively measure the significance of romance in both corpora. Using the Word List tool again, the word “love” makes up around 0.01% of the canon work, compared to 0.033% of the fan-fiction corpus – its occurrence almost tripling in the latter. A similar result occurs with the word “kiss” (0.0035% and 0.0093%, respectively). Although the calculations are basic, they do hint at the supposition that romance plays an important role in fan-fiction.
A series of trailing questions could be formed post-analysis: if I was able to recognize genre patterns in this small corpus, can text analysis be used to identify genre elements in larger corpora? What about other literature elements, such as voice and writer style? Capitalizing on more advanced technology, how would text analysis shape the future of studying and digesting human literature? Nevertheless, on the level of the analysis conducted here, one must admit that it is certainly nothing short of amazing what one can do with a text mining software and a corpus – how many patterns one can extract about the works and the genre to which they belong, with minimal close contact interaction with the source material.
Ready for grading!
Date: 20th September 2021