Access to the corpus: Please complete this waiver and email it to tulay.orucu.dixon@emory.edu. By signing this waiver, you agree to use the corpus only for non-commercial purposes.
Suggested citation: Dixon, T. (2022). Rules of academic writing: A synchronic and diachronic corpus analysis across the disciplines. (Publication No. 29321169). [Doctoral dissertation, Northern Arizona University]. ProQuest Dissertations & Theses Global.
Historical Academic Writing Corpus (HAWC) is a corpus of 17,085 journal articles published in 1950 and 2020. It represents reputable journal article writing in the disciplines of biology, history, psychology, and mechanical engineering. To be able to identify reputable journals, experts from the selected disciplines were consulted. These consultations resulted in the identification of 8-10 journals per discipline, with priority given to generalist journals over specialized journals. Because HAWC contains all article types published in the selected journals (e.g., book reviews, IMRD articles, reviews, editorials, protocols), it achieves situational representativeness (Biber, 1993). HAWC’s situational representativeness, coupled with its large size, distinguishes it from previous corpora of academic writing.
Baby-HAWC-1 controls for article type, a variable found to be an important predictor of linguistic variation in academic writing. For psychology, biology, and mechanical engineering, only research articles with an IMRD format were included; that is, articles with introduction, method, results, and discussion sections. Please note that (i) some articles had results and discussion together and some separate and (ii) instead of the generic section title “introduction,” some articles had descriptive section titles that reflected the content of the section. For history, IMRD format was not an option as the research paradigm in history tends to be qualitative in nature. Having spent a considerable amount of time organizing the articles in HAWC, I noted that history journals had a limited range of article types, including reviews (of books, museums, shows, etc.), notes and suggestions, notes and documents, memoranda, short notices, letters to the editor, editorials, and qualitative articles. At the beginning of articles or in the headers or footers, history journals labeled all article types except for the longer qualitative articles. These longer qualitative articles are the ones included in the Baby-HAWC 1.
Baby-HAWC-2 contains 300 articles in their published version together with their post-print versions (i.e., accepted drafts of articles before they have been typeset and copyedited by the journal). This sub-corpus allows for an analysis of linguistic changes during editorial processes, changes that we currently know little about. The conversion from .pdf to .txt is yet to be completed as mass conversion may not be ideal for research targeting small changes that happen during copyediting.