Team: Rebecca Wingfield, Arcadia Falcone, Linda Lam, Javier de la Rosa, Scott Bailey
The project is to enrich the metadata for two large collections of single volume novels (approximately 1600 volumes) to make them more discoverable for scholars. The first collection, gathered by Wingfield's predecessor, Annette Keough, is the Jarndyce Collection (1,101 titles). The collections have been catalogued and are currently discoverable in Searchworks as a collection, not individually. That is to say that they are being served to patrons as a mass of undifferentiated texts. Since these are non-canonical authors, they do not have detailed cataloguing information. They are not in Google Books and are not otherwise readily available.
There is great interest in history and theory of the novel in the Stanford English Department. These are collections of novels that many have never read. The are novel texts to work with that no one else has used. This new material will contribute to research agenda that desire to expand beyond the canonical authors and delve into under-collected, scarcely held novels: "the great unread" in English literature. The Stanford Literary Lab was very interested in having this digitized so that they could extend their work on bias and large scale dynamics in literature. There is quite a bit still to learn about the novel. We will look at what HathiTrust offers as hooks into texts https://analytics.hathitrust.org/ https://analytics.hathitrust.org/datasets#ef For example: List of placenames with Index, Classifier, Bag of words, Word vectors
Jarndyce collection (1,101 tiles): https://searchworks.stanford.edu/view/jt466yc7169
The Jarndyce collection is subsumed in a larger “local subject = Single-Volume+Nineteenth-Century+Novels”) collection (1,674 titles): https://searchworks.stanford.edu/catalog?q=%22Single-Volume+Nineteenth-Century+Novels.%22&search_field=subject_terms
(Some may still be in copyright)
To make these novels more easily discoverable to scholars, richer facet data is needed. This could be dates that help determine the period in which the novel is set, place names to determine where it is set, indicators of genre and other topics that might provoke new avenues of research.
Note: a "document" is 1000 characters including whitespace. So breaking up text into 1000 or just multiples of 1000 characters is the best value. 1100 characters is priced as 2 documents or twice the cost of 1000 chars.