The AEOLIAN Network "Blowin' in the Wind"

9 June 2021 • Glen Worthey, Associate Director for Research Support Services, HathiTrust Research Center

Toward the end of April, 2021, Dr. Paul Gooding of the University of Glasgow introduced readers of the Digital Preservation Coalition blog to the AEOLIAN Network (“Artificial intelligence for cultural organizations”), an expansive collaboration among digital humanities researchers and cultural heritage workers in the United States, the United Kingdom, and Ireland.

In this inaugural post for the Artificial Intelligence for Libraries, Archives, and Museums (AI4LAM) community site, I present further thoughts on the goals and activities of the AEOLIAN project in the hopes that our like-minded professional communities will find common cause in promoting and interrogating new computational methods in support of deeply humanistic goals.

Taking inspiration from Paul’s unapologetically punning use of a great song title, “In the AI Tonight,” I’ve decided to follow suit by pursuing the musical AEOLIAN theme and guiding metaphor: the Aeolian harp is an ancient instrument played solely by wind and producing ethereal, haunting, inhuman melodies that are nonetheless a source of inspiration and scientific study. So, too, I hope that our project’s investigations into applications of artificial intelligence in the cultural heritage sector will both inspire and haunt.

My own song-title-inspired post refers to current public discourse about artificial intelligence, which is full of both questions and answers that swirl about us in contradictory, unsettled, but still fascinating fashion. The AEOLIAN project hopes to avoid both the utopian and the cataclysmic modes of much current debate around AI (as in the highly reductive and speculative, “will AI save us or destroy us?” etc.). Rather, we seek to engage critically and practically with questions of current and potential applications of artificial intelligence and machine learning in the service of cultural heritage.

The AEOLIAN project focuses primarily on various kinds of digital cultural heritage collections with restricted or difficult access, whether due to privacy concerns or copyright restrictions; it likewise addresses the human difficulties related to the sheer volume of cultural heritage data (which, barring catastrophic data loss, is always on the increase). In particular, we’re interested in exploring the potential of artificial intelligence methods to address these issues. Our three main objectives are: to make digital collections more accessible; to analyze these collections using innovative AI research methods; and to identify synergies and collaborative avenues between US and UK cultural organizations engaged in AI-enhanced research and access methods.

Let’s dig into a few specific scenarios in which these methods might alleviate the new challenges inherent in digital cultural heritage collections, focusing first on culturally significant email archives. Although these may be relatively few in number in the world’s archival collections now, such collections will inevitably grow in number and importance as time goes on, and dealing with them is already a genuine problem for both archivists and researchers. It’s safe to say that at least a portion of all email archives is private or semi-private: for very good reasons, you won’t find them out on the open web the way you’d find digitized archival photo collections, for example. But having machine-mediated access -- the more “intelligent” and flexible, the better -- puts the human researcher at a slight remove from confidential information, while also abstracting and de-personalizing that information in ways that can be useful for research and still respectful of privacy.

Another important area of existing, real-world AI-enhanced and -inflected methods is in massive digitized library collections, including especially those containing in-copyright materials. It’s not so much that these are private as that unmediated human access to them is highly restricted by law. To take one prominent example, the HathiTrust collection, the most comprehensive academic digital library ever assembled, contains nearly 17.5 million volumes: of these, only about 6.8 million (~39%) are considered to be safely in the public domain. That means that nearly 2/3 of this crucial cultural heritage collection is essentially closed to public view. At the same time, the sheer scale of this material puts it far beyond normal human means of comprehension or analysis. While AI and machine learning have certainly not solved either of these deeply embedded difficulties, my group at the HathiTrust Research Center has focused nearly all of its efforts on machine learning and related technologies to create tools, data, and research methods that not only mitigate the effects of scale and data restrictions, but more importantly offer new ways of reading and understanding our cultural heritage.

It is in that spirit that AEOLIAN brings together humanists, curators, computer scientists, archivists, librarians, and others to consider and enable the transformation of digital archival access and use with the help of machine learning and AI. Over the course of two years, we’ll be organizing six online workshops, the first of which, “Employing Machine Learning and Artificial Intelligence in Cultural Institutions,” hosted online by Loughborough University and Dublin City University, will take place on July 7, 2021. Attendance is limited, and applications are due on June 18.

The project will also document five case studies from US and UK cultural institutions; through all of these activities, we’ll strengthen and grow our already substantial international network of scholars and practitioners working with digital collections. At the conclusion of the grant period, the AEOLIAN team will produce a major interdisciplinary report on the uses of AI at cultural institutions, along with a set of agenda-setting scholarly publications.

The AEOLIAN team is led on the US side by my institution, the HathiTrust Research Center (HTRC), which is hosted jointly at the University of Illinois School of Information Sciences and the Luddy School of Informatics at Indiana University Bloomington. On the UK side, the project is hosted at Loughborough University and led by Dr. Lise Jaillant, with major partners at Durham University, Glasgow University, and Dublin City University. The entire multinational effort is generously funded jointly by the U.S. National Endowment for the Humanities and the U.K. Arts and Humanities Research Council, as one of eight recipients of their “New Directions for Digital Scholarship in Cultural Institutions” grants.

But even this large list of project leaders only scratches the surface of the AEOLIAN collaboration! Major project partners at Stanford University, Auburn University, and The Frick Collection (in the US), and at the National Libraries of Scotland and Wales, and the Wellcome Trust (in the UK), will each contribute a case study or hosted workshop. And our host of additional dedicated project partners includes Harvard’s Houghton Library, the History of Parliament Trust, Yale University Library’s Digital Preservation team and Music Library, Indiana University Libraries, University of North Carolina at Chapel Hill Libraries, and the Educopia Institute. All of these outstanding partners are contributing effort in support of the nascent AEOLIAN Network.

As both the popular press and pop culture have been demonstrating for a long time now, the implications of AI are many, varied, and in large measure yet to be deeply understood; in my opinion, they are also too often greatly over-hyped, and too obsessively fretted over. In spite of these largely speculative reactions to AI, we in the AEOLIAN network believe strongly that artificial intelligence methods may have a tremendous amount of practical and necessary potential in our work preservering, presenting, and engaging with the fruits of human intelligence.

Perhaps the greatest goal of the AEOLIAN project is to assemble people and institutions who work with, and think about, these methods every day; to talk through their implications in our shared cultural heritage sphere; and not only to discuss the practical, technical aspects of these methods, but more importantly to grapple with their scholarly and ethical aspects, and to work toward consensus on best practices, guidelines, and even provocations to sustain this work as it progresses.

The answers to many questions about what AI can do for us are blowing in the wind, and like the ancient instrument it’s named for, the AEOLIAN network hopes to capture and make music with some of them. Please join us.