This study examines the language used in two distinct weight loss text corpuses - one turn-of-the-century books, the other current magazine weight loss verticals - using two different modes of analysis: HTRC Analytics and Voyant Tools.
The first corpus consists of eight weight loss texts published between 1889 and 1921. These texts were chosen for their presence in searches conducted in the HathiTrust (a database of digitized texts from research libraries) for "reduce" "fat," and "diet" within the desired timeframe of 1885-1925, as the literature suggests that weight loss efforts began in earnest in the 1880s and had truly become an industry by the 1920s (see Background). The literature also determined the search terms, as it appeared that the actual term "weight loss" was not often used, while "reduce," "fat" and "diet" came up frequently. Many of the texts that came up in these searches did not in fact deal with elective human weight loss, and they were screened out for this reason. The eight remaining texts were then compiled into a collection entitled "Turn of the century weight loss."
HTRC (which stands for HathiTrust Research Center) Analytics was chosen to analyze this collection by virtue of being the most straightforward way to analyze most HathiTrust materials. Collections created in HathiTrust can be imported as worksets, which can then be analyzed by any of four distinct algorithms: Extracted Features Download Helper, InPho Topic Model Explorer, Named Entity Recognizer, and Token Count and Tag Cloud Creator.
Because the goal of this project is to examine the language used to write about weight loss, it was determined that the InPho Topic Model Explorer and the Token Count and Tag Cloud Creator would produce the most useful data. The workset was thus analyzed via these two algorithms. When processing the collection via the Topic Model Explorer, the default mode of 200 iterations and "20 40 60 80" number of topics were selected. When processing the collection via the Token Count and Tag Cloud Creator, the default list of stop words and the default 200 maximum number of tokens to display were selected. Additionally all tokens were set to be lowercased before counting commenced. The resultant output of both these algorithms can be found on the Results section of this site.
The second corpus consists of two webpages: the weight loss verticals for Women's Health and for Men's Health. These webpages were chosen because of the mainstream appeal and reputability of the publications (see the Women's Health media kit and Men's Health media kit for statistics) as well as the large collection of article titles that appear under each vertical, providing a rich diversity of topics under the theme of weight loss. While such popular material may appear to differ significantly from the research library texts digitized by HathiTrust, the literature examined for project background suggests that the latter would have been quite mainstream when published, with some of the chosen HathiTrust texts even mentioned by name in such broad cultural studies as The Hundred Year Diet: America's Voracious Appetite for Losing Weight.
Voyant Tools was chosen to analyze this corpus because of its accessible interface, the ease of importing webpages into its system as analyzable materials, and the large variety of analyzation tools it provides. Within the default skin of the Voyant Interface (i.e. the array of tools first presented to the user), it was determined that Cirrus, Trends and the Summary would yield the most pertinent data for this project. These tools were initially run simultaneously with only the stopword list that Voyant automatically puts into place - a collection of letters, symbols, and commonly used non-nouns such as "the" and "a."
It was, however, quickly determined that many of the words showing up in the Cirrus word cloud, the Trends line graph and the Summary overview were not pertinent to this study. These words included names of regular article contributors, date abbreviations, publisher name, and commonly used website navigation language. The following stopwords were thus added to Voyant's default list: mar, apr, advertisement, subscribe, hearst, newsletter, ad, stacey, leasca, mike, briana, brenza, marty, darling, jameson, jenna, below, continue, reading, photo, search.
Tools that do not appear in Voyant's default skin - such as Bubblelines, Collocate, and Knots - were also explored, but the results of testing with these tools revealed them to be unnecessary for this particular study. More discussion of these tools can be found on the Future Possibilities page of this website.