This study examines the language used in two distinct weight loss text corpuses - one turn-of-the-century books, the other current magazine weight loss verticals - using two different modes of analysis: HTRC Analytics and Voyant Tools.
The eight texts in the HathiTrust "Turn of the century weight loss" collection were first analyzed with HTRC Analytics' Token Count and Tag Cloud Creator. This algorithm identifies the words (which it refers to as "tokens") that occur most frequently in a corpus, as well as the number of times they occur. The complete token count for this particular collection can be viewed as a spreadsheet. The top ten most frequent tokens can be found below:
Another output of this tool is a visualization of the most frequent terms in a corpus as a "tag cloud," otherwise known as a word cloud, where the size of each word corresponds to the number of times it appears in the corpus. The word cloud for the "Turn of the century weight loss" collection can be viewed below:
The texts in the HathiTrust "Turn of the century weight loss" collection were next analyzed with HTRC Analytics' InPho Topic Model Explorer algorithm. This algorithm identifies topics in a set of documents using Latent Dirichlet Allocation (LDA), which views documents as bags of words and assumes that "the way a document was generated was by picking a set of topics and then for each topic picking a set of words" (Doll). Thus, the product of this algorithm's analysis is a series of topics, each containing a set of words that the algorithm decided belonged together. A complete list of the topic sets created by the InPho Topic Model Explorer can be viewed as a pdf, while a random selection of ten can be found below:
This tool also produces an interactive visual of the topics and their proximity to each other, which is accessible as an html page hosted by HTRC Analytics.
The weight loss verticals of both Women's Health and Men's Health were analyzed as one corpus by Voyant Tools. The interface then provides with the results from a variety of default tools, only a few of which will be examined here. Summary and Cirrus will be presented together, as their combined results provide a neat one-to-one comparison with HTRC Analytics' singular tool, the Token Count and Tag Cloud Creator.
Summary provides a variety of information, only some of which has been deemed useful for this study by virtue of allowing us to more directly compare the two corpuses. The "list of most frequent words in the corpus," for example, directly mirrors HTRC's token count. A complete list of most frequent words can be viewed as a spreadsheet, while the top ten most frequent words in the weight loss verticals of Women's Health and Men's Health can be seen below:
From the Summary, we also know the exact size of this corpus, which can otherwise be hard to know, especially when a corpus consists of scrolling webpages. This corpus contains 1,471 total words and 557 unique word forms.
Cirrus is a tool that produces a word cloud containing the most frequent words in the corpus (minus the stop words indicated on the Methodology page). Just as in the tag cloud produced by HTRC Analytics, the size of the word corresponds to its prevalence in the text. Voyant Tools allows you to choose how many terms appear in your cloud; the cloud for this study contains 205 terms, as this was the closest setting to 200 - the amount that HTRC automatically includes in the tag clouds it produces - and would thus allow for an easier one-to-one comparison. The word cloud produced by Cirrus for this corpus can be viewed below:
Voyant's Trends tool takes a different tack than most of the other available tools - instead of considering the corpus as a whole, it compares its primary components . Thus the following line graph compares the use of terms - suggested in this instance by Voyant - in (1) the weight loss vertical of Men's Health with (2) the weight loss vertical of Women's Health:
To further capture the nuanced differences between the two texts in this corpus, comparisons were then conducted in the Trends tool using related terms:
The line graphs for both of these sets of terms are below, where again, (1) is the weight loss vertical of Men's Health and (2) is the weight loss vertical of Women's Health: