The foundation of our work is turning the raw data of anthology tables of contents into computationally tractable data contained in a relational database designed and maintained by our collaborators, Erik Fredner and J.D. Porter. (For a full description of the database and data entry process, see Fredner and Porter's 2024 PMLA article reporting their work on the Norton Anthology of American Literature linked here and under our resources tab.)
All lab members entered data from multiple general anthologies of African American literature, and in 2024-25 we completed data entry for all thirty-nine extant general anthologies published from 1929 through the 4th edition of the Norton Anthology of American Literature, which was published in spring 2025.
As of May 2025, 23 of the 39 have been reviewed and regularized to ensure maximum accuracy of our data.
Clusters of Lab collaborators continued to work to develop their SQL skills for querying the database--along with developing ideas for what kinds of questions this data might allow asking and help answer. Several of the reports in our Working Papers section rely on these initial rounds of queries. Querying will be a main focus of our work in fall 2025.
Hand entering data is a laborious process. Lab members with a primary interest in data science continued their 2023-24 work to develop a process for machine reading tables of contents, aiming to develop a flexible process that could work consistently on a variety of ToCs.
The goal is to be able to consistently extract author names, work titles, and page numbers along with providing a confidence interval on the extracted results, so that our first steps of data entry could be made more efficient. (Human review will remain a part of our process, even if machine reading becomes possible.)
In contrast to last year, when focused on developing a consistent classification scheme for extracting the data from processed OCR files, which proved to inflexible to deal with the variance in ToC formats, this year we platformed on that knowledge to draw in AI/LLM tools. Veda focused on using ChatGPT to process full ToCs presented in various ways (as images, plain text, etc.). Al focused on using ChatGPT in a multi-step process that avoided some issues with processing capacity. Marina developed a method for comparing the results of Veda and Al's work with our hand-entered data to determine which methods were most accurate.
We ended the year at what we think was the threshold of a breakthrough and will take this work up first thing in Fall 2025. If we are correct that we are at breakthrough, then we will begin work on new, unentered anthologies (probably the main competitor to the Norton Anthology of American Literature, the Heath Anthology of American Literature, published in seven editions from 1990 to 2014).
We (lead by Ava) sent our snap poll to 50 university English departments--the main public university in each state--in the fall of 2024. Response rates were not what we had hoped, though we did get some data which provides the basis for a preliminary report in our white papers section.
We intend to send another round of our poll just to Virginia universities in Fall 2025 in hopes that this more targeted and more local group will yield higher responses.
2023-24 Report
We have developed an anonymous online snap poll to gauge participants' familiarity with a select group of American authors. Our poll asks participants to express their familiarity with an author on a 3-point scale: 1 = have read; 2 = have heard of but not read; 3 = have never heard of or read.
Here is the rationale for our poll:
In "Counting on The Norton Anthology of American Literature," a data-informed analysis of the NAAL, our collaborators Erik Fredner and J.D. Porter find four salient things:
The NAAL has diversified its representation of authorial identities to include more women and non-white authors.
It has achieved that diversification by more than doubling the number of anthologized authors, even as the physical anthology itself has remained roughly the same size, and anthologizes roughly the same number of works.
Thus, being selected as an author in the NAAL is now less valuable (is likely smaller in terms of selection size and less likely to command reader attention/focus because of the greater size of the anthology) than it was in the early, more tightly selective and focused NAALs.
103 authors, who are predominantly white and male, have been selected for all ten editions of the NAAL, from 1979 to 2022.
Based on their findings and their understanding of the function of anthologies (to select by several expressed criteria of value; to guide limited teacherly and readerly attention and time), Fredner and Porter argue:
NAAL editors should move from what Fredner and Porter term the strategy of authorial growth to a strategy of redistribution.
This more focused NAAL should, at the same time, commit to diverse authorial representation--which is not at odds with selecting for literary excellence; rather, it counters previous, negligent at best, unjust at worst, selection principles.
A sensible place to begin thinking about authors to DE-select for future NAALs would be the pool of 103 authors who have always been represented. Do all warrant continued inclusion?
Our poll intends to provide some suggestive data for future NAAL editors, as well as editors of other anthologies of American literature, to consider. The "canon" is not a popularity or familiarity contest; it emerges from (and constantly shifts with) a complex mixture of expert judgment, appeal, availability, etc. Consequently, our data will not be dispositive. Rather, we hypothesize, it will indicate which of these 103 authors are being taught to and read most widely by U.S. high school and college students--and which have stuck in their memories. This may provide useful information for anthology editors to consider as part of their selection judgments and for literary scholars as they review and revise the history of American literature(s).
We piloted our poll in late Fall 2023 via distribution to William & Mary English classes. Based on that pilot study, we determined that such a poll would likely yield useful, interpretable information and made refinements in our poll design. Our poll has been approved by W&M's Human Subjects committee, and we will circulate the poll to a representative array of 100 U.S. English departments in early Fall 2024 and release results via this website.