Fantastic Futures 2024 Papers
4 October 2024 • Ingrid Mason
Abbey Potter. Innovating Responsibly: Pathways for AI Transformation in GLAMs
This workshop will introduce participants to the LC Labs AI Planning Framework and key activities to evaluate ways libraries could use AI responsibly. Participants will collaborate in small groups to test out or generate use cases, document the potential risks and benefits of AI in libraries, assess data readiness, explore testing scenarios, and evaluate outcomes.
The LC Labs’ Artificial Intelligence Planning Framework helps staff assess the potential risks, benefits and readiness of AI use cases and provides a roadmap for understanding, experimenting, and implementing AI at the Library. It is based on several years of experimentation, demonstrations, and research into how machine learning (ML) and AI could be applied responsibly at the Library. The AIPF results in quality benchmarks, documentation, or evidence for supporting the business justification (or the opposite) for enhancing products or services with AI tools or approaches. The AI planning framework and activities will demonstrate how to protect organizations, staff and users while encouraging innovation.
Structured workshop: 2 hours, Abbey Potter, Library of Congress & Mike Trizna, Smithsonian
Barbara Shubinski. Gender in a (Supposedly) Non-Gendered Historical Space: Generative AI, Sentiment Analysis, and Graph Databases in the Rockefeller Foundation Collections of the 1930s
The Rockefeller Archive Center (RAC) initiated a research project aimed at the digitization and subsequent analysis of approximately 12,000 pages of typed, paper, analog records from the Rockefeller Foundation (RF) collection. This endeavor resulted in the construction of a Neo4j knowledge graph database, which currently encompasses over 80,000 nodes representing various entities, including individuals’ names, organizations, topics, technologies, grant amounts, and funding awardees. The knowledge graph facilitates the visualization of more than a million relationships among these nodes, including conversations with funders, places of employment, topics of study, and student-teacher relationships, thereby uncovering patterns and connections that are not readily discernible in linear data representations.
The project has had several ambitious objectives. Primarily, we have sought to explore the feasibility of extracting structured data from analog records. This project aimed to decenter the narratives of foundation program officers, potentially revealing the obscured voices of external influencers, and bringing to light not only what did happen during this time period, but what did not happen. Additionally, the project has aimed to visualize a fuller range of historical trajectories than merely the “main story” of scientific discovery during this period; most importantly, we aimed to gain a more profound understanding of the dynamics of grantmaking, network-building, critical tipping points, and the evolution of fields within the natural sciences during the 1930s, a period marked by significant advances in atomic physics and molecular biology, both of which were heavily supported by the RF.
A notable but somewhat surprising aspect of this project has been its uncovering of gendered assumptions within a corpus that ostensibly ignored gender (almost, but not quite, exclusively, most scientists and grantees within this period were white, European males). Initial challenges included resolving named entities and distinguishing between conversations about an individual versus those with an individual. The introduction of GPT-4 provided a solution, offering cleaner and more comprehensive data processing compared to earlier natural language and rules-based approaches. And, by allowing for sentiment analysis conducted by GPT-4, our research team ended up being able to identify biases and institutional hierarchies, highlighting how comments about the few female scientists contrasted with those about their male counterparts. This analysis then raised important questions about subjectivity, privacy, and the ethical use of personal information contained in archival records.
The RAC, as a major repository of philanthropic records, holds collections from over 40 foundations and numerous non-profit organizations. This project reflects the RAC's commitment to democratizing access to these records, enabling complex quantitative research, and engaging more deeply with the communities represented in its holdings. Despite the inherent biases in historical records created by powerful institutions, the RAC's efforts aim to incorporate a broader range of voices and perspectives, showcasing both well-documented stories and lesser-known narratives. This project has also helped us to investigate what might characterize the appropriate role as well as the required capacity of an archive like ours in terms of data provision related to paper, analog, historical records.
In conclusion, this learning and demonstration project underscores the transformative potential of advanced digital tools like GPT-4 in historical research and data analysis. By integrating cutting-edge technologies with traditional archival practices, we are trying to envision new avenues for understanding and interrogating archival materials, hopefully broadening our audiences and cultivating researchers across a broader range of methodologies and disciplines, ultimately contributing to a more nuanced and inclusive historical narrative.
Lightning talk: 5 mins, Barbara Shubinski, Rockefeller Archive Center
Benjamin Lee. Reanimating and Reinterpreting the Archive with AI: Unifying Scholarship and Practice
This presentation brings together different disciplinary perspectives to explore how AI helps us to reanimate and reinterpret digital collections held by GLAM institutions. With two case studies, the presenters will demonstrate how academic projects not only further scholarly inquiry but also advance GLAM practice itself. Andrew Dean (Assistant Professor, Writing & Literature, Deakin University) and Benjamin Lee (Assistant Professor, University of Washington Information School) will speak about their collaboration to reimplement Nobel Prize-winning novelist J.M. Coetzee’s AI-generated poetry – one of the earliest examples of generative language produced by a literary figure. With the preservation of Coetzee’s FORTRAN code at the Harry Ransom Center at UT Austin, it is possible to recover the generative algorithms that Coetzee wrote during his time as a computer programmer in the mid-1960’s, before his rise as a novelist. Significantly, the re-implementation of this code makes it possible not only to properly understand his computer-generated poetry but also reanimate the algorithms themselves, thereby enabling the generation of new Coetzee poems. In his presentation, Dean will talk about this ongoing work, its implications within literary studies, and how this work serves as a roadmap for GLAM institutions attempting to provide access to born-digital code files. Katherine Bode (Professor, Literary and Textual Studies, Australian National University) will discuss a project exploring the potential of word embedding techniques for archival research. Contrary to the prevailing belief that machine-learning or Artificial Intelligence (AI) technologies discover or uncover patterns in their training datasets, this project posits them as participants in the evolving knowledge apparatus of the historical record. They do not uncover so much as become part of relations that constitute the possibility of historical knowledge; and like traditional infrastructural and institutional arrangements, they are infused by history, politics, and power. We explore this issue with a case study devoted to investigating Irishness in an expansive archive of digitised Australian newspaper fiction. Formed from complex intersections of sectarianism, race, geography, migration, and diaspora, literary traditions, and religions, Irishness in Australian literature is not a findable thing so much as an array of intersecting cultural forms and practices. By moving between the textual technologies (the newspaper novels) we’re accustomed to reading and writing with and those we’re not (a word embedding of a large digital archive) we consider the values that might - and should - guide archival research in changing techno-textual conditions.
Presentation: 15 mins, Benjamin Lee, University of Washington, Kath Bode, Australian National University & Andrew Dean, Deakin University
Daniel Lewis. Practical Applications of AI/ML Today: Decoding the Past, Shaping the Future
In this immersive 2-hour workshop, participants will explore the cutting-edge applications of Artificial Intelligence (AI) and Machine Learning (ML) in the fields of history and cultural heritage. The session will focus on practical, hands-on activities to decode historical data and shape innovative solutions for future challenges. Participants will gain insights into how AI/ML can be leveraged to uncover hidden patterns in historical records, enhance digital archiving processes, and create engaging user experiences in GLAM (Galleries, Libraries, Archives, Museums) institutions. Whether you're a tech enthusiast or a heritage professional, this workshop will provide the tools and knowledge to harness the power of AI/ML in your projects.
Structured workshop or hack: 2 hours, Daniel Lewis, State Library of Victoria
Emily Pugh. AI-Generated Metadata and the Culture of Image Search
GLAM patrons both /are/ and /are not/ prepared for the kinds of image searching that AI-generated metadata will facilitate. This is one takeaway from a four-year Getty Research Institute (GRI) project entitled PhotoTech, which focused on using AI for both art-historical research and metadata generation in relation to digital image collections. Based on the outcomes of this project, this presentation will provide an overview of the current approaches to image searching as they exist in the field of cultural heritage, explicating the disconnects between these approaches and those required by an environment of AI-enhanced search. The fifteen-minute presentation will consist of two parts. I will begin by sharing the results of a qualitative survey on image searching practices we conducted as part of PhotoTech, which received over 500 responses from professionals in cultural heritage and art and architectural history. The survey exposed the extent to which the current search practices of such professionals are tied to traditional approaches to cataloguing, relying for example on standardized subject headings and vocabularies. However, based on responses to a series of questions about computationally generated metadata, the survey also revealed that these users are by no means resistant to searches powered by AI-generated metadata.
Next, I will explain how the results of the survey became useful in relation to the research activities of PhotoTech, and specifically our exploration a collection of 30,000 digitized images of photographs of cultural heritage sites in Europe and South Asia. We focused in analyzing these images on a basic, well-established computer vision (CV) task: figure detection. Starting from the simple question of whether photographs depicted or did not depict people, we applied the output of the CV algorithm to research how photographic aesthetics of archaeological and architectural photography developed in the context of colonialism and nationalism. We initially focused on figure detection to support research questions; however, noting hesitancy among the survey respondents in regard to tasking computers with complex, interpretive tasks, we also considered how number of figures might provide an alternate approach to collections discovery. Could a focus on figures provide a way to subvert conventional standards and categories of archival description that assume a European positionality (e.g., labels based on Christian iconography like “Annunciation” or descriptions like “native village”) or that make determinations about the age or gender of the subjects depicted? The insights of our application of AI to both researching and searching in the context of PhotoTech will, I hope, provide a clearer picture of how we might begin to facilitate users’ uptake of new ways of image searching and, as part of this, leverage AI to make collections discovery more representative and inclusive of diverse populations of subjects and scholars.
Presentation: 15 mins, Emily Pugh, Getty Research Institute
Emmanuelle Bermès. Computer vision in the museum: perspectives at the MAD Paris
Since 2023, the Musée des Arts Décoratifs in Paris (MAD) has started to experiment with computer vision in order to process large paper collections (drawings, photographs, prints): one million of items that are still for a major part lacking descriptive metadata. The museum has partnered with Ecole des chartes (ENC) and the national library of France (BnF) in order to bring together expertise with artificial intelligence and to prototype the automated annotation of these images. The first experiment targeted the Royère collection, a designer collection of more than 15.000 interior architecture and furniture drawings. In this project, we used the YOLO (You Look Only Once) model to perform object detection using a specific ontology. We aim to retrieve iconic design objects in Jean Royère’s collection, such as the famous egg chair, as they are represented both in technical design drawings and interior settings. One of the first issues that have arisen from the first attempts at object detection in the Royère collection is the lack of adapted ground truth for fine-tuning. Existing object detection models such as YOLO were not designed to work on drawing collections, and the Royère use-case requires a fine-grained accuracy in order to perform object search on significant forms. Creating the ground truth for fine-tuning is a time-consuming task that can be cumbersome. In order to facilitate this step, we’re investigating different strategies to automate ground truth generation, including perspective distortion, multimodal AI analysis or the use of the Segment Anything Model (SAM). We also plan to extend the project with new collections, including a massive collection of more than 400.000 architecture photographs by Paul Henrot. In this presentation, we will share the results of our investigations and the quality assessment realized with the different approaches we have tested at the MAD Paris. We will also share the perspectives of the project in terms of impact for the use of AI on the work of museum curators, including annotation storage, validation and reuse. Moving from experimentation to integration of AI in cultural institutions is definitely a major issue, and one we hope to help tackle by investigating the processes related to AI and computer vision in relationship with usage perspectives and curators tasks. Eventually, the project will lead to investigate a profound transformation of the museum’s curation process. Instead of starting with collection description before digitization and enrichment with AI, we would like to design and experiment a pipeline starting with digitization, then automated annotation with AI and finally manual curation of the extracted metadata. This complete inversion of curation steps is potentially a major change brought by computer vision in the museum, a phenomenon we are eager to observe and share with the community. It also implies the development of specific skills and AI literacy among the museum staff and fuels the reflection on integrating AI in data engineering curricula at Ecole des chartes.
Authors: Emmanuelle Bermès (ENC), presenter, Marion Charpier (MAD), Jean-Philippe Moreux (BnF), Natacha Grim (ENC)
Presentation: 15 mins, Emmanuelle Bermès, Ecole nationale des chartes / PSL
Francis Crimmins. Evaluation of techniques that improve findability of historic images in a large and diverse corpus using AI vision models and embeddings.
The NLA has a large number of images available via its online catalogue. The vast majority of these do not have any detailed textual description, and so keyword searches have to rely on matches against the title and author fields. We have implemented a prototype system which uses an AI vision model to generate a detailed textual description of a representative sample of 5000 images from the catalogue. In a sense this can be considered as a “transcription” of each image, in the same way transcriptions can be automatically generated for audio data. This allows us to provide a standard keyword search over these descriptions, giving improved precision and recall when searching for pictures. A spoken version of these descriptions can also be generated using a Text-to-Speech model. This could be used to provide better accessibility for some users e.g. they could listen to the description of the item. (We have only done this for a small number of descriptions, as a proof of concept). We have also generated numeric embeddings (vectors) for each of these descriptions, and used this to provide a "similar images" feature, by performing a k-nearest neighbour search across the vectors. As well as a traditional list display the similar images can also be displayed in a grid. Clicking on an image automatically reorders the grid, so that the “seed” image will be in the top left hand side position, with the next most similar beside it, and so on, in left to right reading order. As you get further away from the seed image you will see the image content starting to diverge, and by clicking on other images it is possible to browse quite diverse "neighbourhoods" within the dataset. This supports a "search and browse" capability for interacting with the collection. The numeric embeddings can also be clustered, with a representative image (closest point to the mean embedding) used for each cluster. This implementation generates a static set of clusters, allowing this overview of image groupings in the collection to be displayed, without needing any initial text query. No dynamic clustering of search results is currently supported using this approach. If an image is clicked on then a k-nearest neighbors query is run as described earlier. The initial set of images displayed for that request should also be within the representative image’s cluster, but eventually images from outside the grouping will start to be displayed. The cluster display allows a user to get a sense of what type of images are available in the collection being searched. For example, it can be used by a curator to identify clusters of images they may want to exclude. Finally, we have taken the images and text descriptions and used OpenAI CLIP to generate image embeddings, and also used the text embedding capability of CLIP to generate text embeddings for both the NLA catalogue metadata for the image and the OpenAI description. The overall number of images processed is 201k (although only the original 5000 images have the OpenAI text description). We plan to present: (1) A comparison of image descriptions generated by proprietary (e.g. OpenAI gpt-4-vision) and open-source (e.g. LLaVA) models. (2) A comparison of the effectiveness of embeddings generated by open-source (many available) and proprietary (OpenAI) text embeddings from these generated image descriptions against the open-source CLIP native image embedding for: text-based image searching; image similarity searching; automatic image clustering. (3) A comparison of text/keyword, image embedding and "hybrid" (combination) searches.
Presentation: 30 mins, Francis Crimmins, National Library of Australia
Grant Heinrich. The first 140 days: how we’re teaching an American transcription engine to speak Australian
It’s hard to escape the US influence on machine learning, AI and particularly machine transcription. The vast collections of words and speech controlled by tech companies such as Microsoft, Google and Facebook give US companies the edge in developing new, more accurate transcription engines. Today, English-language transcription is relatively accurate tracking American speech and American accents, but confront it with the challenge of Australian vernacular speech and it’s like the engine suddenly has a roo loose in its top paddock – what it translates doesn’t always makes sense. That’s why this year, the Australian National Film and Sound Archive (NFSA) set itself a challenge. Having launched a new transcription pipeline in May 2024 with a clean copy of an American transcription engine - OpenAI’s Whisper - we set out to see if we could turn an American transcription engine Aussie in the 140 days before the Fantastic Futures Conference. In this session, we’ll discuss issues such as: how to influence huge data models with relatively small collections of training data; a novel approach to understanding accuracy; our first steps into crowdsourcing corrections to transcripts via our new Bowerbird UX; managing security around sensitive material that can't be transcribed; and how we're managing entities across the NFSA collection.;
Presentation: 30 mins, Grant Heinrich, National Film and Sound Archive of Australia
Honiana Love. Join us on a journey to the future
Ngā Taonga Sound & Vision is Aotearoa New Zealand’s audiovisual archive with a collection spanning more than 100 years of rich history. We hold a unique position as the only dedicated audiovisual archive capturing the history of New Zealand, told in sound and moving image. There are over 800,000 items in our care, dating back to 1895, and include film and television, sound and radio recordings. Ngā Taonga is a small organisation, of less than 120 kaimahi, operating as an independent charitable trust, governed by a Board of Trustees and funded predominantly by Manatū Taonga, the Ministry for Culture and Heritage; the Lottery Grants Board; and Te Māngai Pāho. Over the last five years, the organisation has transformed from having an at-risk and largely inaccessible collection stored on physical formats to an archive that now has a significant proportion of its collection digitally preserved – with strong pathways for sharing collection material. We are, in essence, a digital archive that supports analogue preservation. Our rapid change was only possible because of our small size and our willingness to innovate. While we have limited financial resources, we can be agile and develop different ways of responding to the changing needs of the sector, supported by our skilled kaimahi. This presentation will offer an overview of the innovations we’ve adopted, to date, including our fundamental commitment to mātauranga Māori (a concept that essentially translates to ‘traditional Māori knowledge’). We care for the largest body of historical recordings of te reo Māori (‘the Māori language’) and mātauranga Māori in the world; taonga (‘treasured possessions’) which are unique to Aotearoa New Zealand. We are committed to being a kaupapa-centred organisation, a concept where we recognise and deeply embrace that, in our work, there is more than one world view. We have created a relationship framework for working with kaitiaki (those people who have a connection to items in our collection as caretakers or guardians), recognising the importance of their role in approving the use of items they have a connection with. We have mātauranga Māori specialist roles across the organisation to ensure that we have that we have a level of expertise to be able to engage with taonga Māori in a genuine Māori way when required. We developed an unprecedented programme (‘Rokirokitia’ – ‘to preserve or care for’) of community-based magnetic media digitisation which saw us working with iwi/Māori all over Aotearoa New Zealand. We supplied portable purpose-built digitisation kits and training so that communities could rescue near-obsolete media and store it locally in digital formats. We are running a digital preservation project (called Utaina) for at-risk audiovisual material at a scale not previously undertaken in Aotearoa. In 2023 it was the largest digitisation project in the world and represents a new approach to digital preservation. We have worked with the developer (Te Hiku Media) to test Artificial Intelligence transcription of te reo Māori to support our commitment to create catalogue entries in the language of the taonga we hold. We do not own any of the material in our collections and the nuances of upholding our commitment to respecting the rights of numerous different groups places us in a unique position in the sector - copyright holders (those who hold rights in the content of the material), depositors (those who own the physical item deposited with us), and kaitiaki (those holding cultural ownership rights). We have developed a preservation prioritisation framework that considers access to have the same weighting as preservation needs and cultural / national significance. Now we turn our energies to becoming a digital-first organisation and the challenge of integrating technology into our organisation within an environment where technology is evolving exponentially and shows no signs of slowing. We face the challenge so many others are also dealing with. How do we avoid the Betamax trap, investing in technology that has a limited lifespan before being discontinued? How do we manage our limited funds to make the best investments for the future? What does adoption of emerging technology look like for small organisations or not-for-profits? What does a collaborative investment in technology look like – what could we offer partners to engage with us? How do we juggle the investment in staff needed to engage with nascent technology when our funding barely allows us to invest in the technology itself? How can we engage with communities to support their move to digital? What are the barriers to championing an indigenous language in a world where English remains the language of the internet? How do we navigate the opportunities and challenges of our position at the crossroads of the corporate world, the GLAM sector and government? Can we use modern technology to review our historic records and update with modern approaches to access?
Presentation: 30 mins, Honiana Love, Ngā Taonga Sound & Vision
James Smithies. Large Language Models and Transnational Research: Introducing the AI as Infrastructure (AIINFRA) Project
The problem of evaluating generative AI tools in the GLAM and research sectors is complex and pressing. While Large Language Models (LLMs) appear to have significant value for some aspects of archival management and the delivery of advanced search functionality, their deployment as expert chatbots or research assistants is fraught with difficulty. Several commercial services already exist, and have obvious value for generic workflows, but there are no obvious or accepted ways to evaluate their quality as research tools. Deploying them in production in a GLAM or research context raises myriad potential issues, from ‘hallucination’ of people, places, and facts to cost and security. The AI as Infrastructure (AIINFRA) project is developing resources to resolve that situation, including a test framework and prototype AI research tool, with a focus on transnational collaboration, Slow AI, and Indigenous design principles. Project partners include the Australian National University, King’s College London, The National Library of Australia, the UK National Archives, Australian Parliamentary Library, the UK History of Parliament Project, and Aotearoa New Zealand’s Te Tari Taiwhenua | Department of Internal Affairs. Indigenous consultancy is provided from Taiuru & Associates (Aotearoa New Zealand) and the Scaffolding Cultural Creativity Project (Australia). The presentation, delivered by multiple members of the project team, will provide an overview of the project and: describe its core collaboration and design principles; detail its primary workflows; present the requirements for its central prototype AI research tool; and offer a glimpse into its preliminary research findings. Several aspects of the project will be of interest to the AI4LAM community. Particular attention has been placed on limiting AIINFRA’s scope, to allow for adequate attention to technical and cultural testing, for example, and to accommodate the complexities attendant on a transnational project of this kind. The project is restricting its activity to the analysis of transnational Australian, Aotearoa New Zealand, and United Kingdom history in 1901 (the year of Australian federation and the death of Queen Victoria) using Hansard content from the respective countries as source input. Biases in the Hansard documents raise cultural and ethical issues, including those related to the White Australia policy, land sales in Aotearoa New Zealand, and attitudes towards colonialism. By using Hansard as source material, the project is also forced to content with the implications of deploying generative AI over documents that are key to the democratic process. The restricted scope of the project also reflects a commitment to the development of software test maturity in the digital HASS and GLAM spaces. The assumption is that the design and deployment of AI in HASS and GLAM contexts will rely more heavily on testing than was the case with previous technologies, raising a range of issues that cannot be resolved through the straight-forward application of commercial methods. To facilitate this a holistic testing framework will be developed that can be applied to multiple AI platforms, along with the prototype tool designed to model the ‘perfect’ open-source approach to such technologies. The project has adopted a holistic approach to technology assessment, design, and development, considering issues such as environmental impact, scholarly transparency and reproducibility, support for Responsible AI, FAIR and CARE data principles, and Indigenous knowledge protocols alongside problems of long-form factuality and cost per prompt, and these aspects will all be reflected in the test framework. The goal is to provide insights tailored to users and institutions in the HASS and GLAM sectors, able to be applied to any research service powered by Large Language Models (LLMs) now and into the future.
Authors: James Smithies, Barbara McGillivray, John Moore, Martin Spychal, Karaitiana Taiuru.
Presentation: 15 mins, James Smithies, The Australian National University & Karaitiana Taiuru, Taiuru & Associates
Javier de la Rosa. The Mímir Project: Evaluating the Impact of Copyrighted Materials on Generative Large Language Models for Norwegian Languages
The Mímir Project, an initiative by the Norwegian government, aims to assess the significance and influence of copyrighted materials in the development and performance of generative large language models (LLMs) specifically tailored for the Norwegian languages. This collaborative effort involves three leading institutions from different regions of the country, each contributing unique expertise in language technology, corpora curation, model training, copyright law, and computational linguistics. The ultimate goal of the project is to gather empirical evidence that will guide the formulation of a compensation scheme for authors whose works are utilized by these advanced AI systems, ensuring that intellectual property rights are respected and adequately compensated. The advent of LLMs has revolutionized natural language processing (NLP), enabling machines to generate, understand, and interact using human languages with unprecedented accuracy and fluency. However, these models require vast amounts of textual data to train, often sourced from a wide array of digital repositories that include copyrighted materials. This raises critical legal and ethical questions regarding the use of such content without explicit permission from authors, as well as the economic implications for the creative industry. The Mímir Project addresses these concerns by systematically evaluating the role of copyrighted texts in enhancing the capabilities of LLMs for Norwegian. The project's methodology involves a comprehensive analysis that spans several stages. Initially, a diverse corpus of Norwegian language data is assembled, incorporating both copyrighted and non-copyrighted materials, plus materials commonly found in the Internet. This corpus serves as the foundation for training various LLMs, each with different configurations and access levels to copyrighted content. By comparing the performance of these models across a range of linguistic tasks, such as text generation, translation, and sentiment analysis, the project seeks to quantify the specific contributions of copyrighted materials to the overall model efficacy. To ensure robustness and reliability, the evaluation framework focuses on both foundational model generation and linguistically-inspired metrics. Quantitative measures include traditional NLP benchmarks like perplexity, BLEU scores, and ROUGE scores, which provide objective assessments of model accuracy and fluency. Linguistic analysis, on the other hand, involves assessing the coherence, creativity, and contextual relevance of the generated outputs. This dual approach allows for a nuanced understanding of how copyrighted materials impact the performance and utility of LLMs. One of the central hypotheses of the Mímir Project is that copyrighted materials, due to their high quality and curated nature, significantly enhance the linguistic richness and diversity captured by LLMs. By systematically isolating the effects of these materials, the project aims to identify specific areas where their inclusion leads to substantial improvements. More specifically, many ablations are conducted on the different aspects of the materials such as their source (books vs newspapers), factuality (fiction, nonfiction), or source language (translated vs original works in Norwegian). This insight is crucial for developing a balanced and fair compensation framework that recognizes the value contributed by authors while fostering innovation in AI and NLP. Moreover, the Mímir Project explores the broader socio-economic implications of its findings. By engaging with stakeholders from the publishing industry, legal experts, and policymakers, the project facilitates a multi-faceted dialogue on how best to harmonize the interests of content creators and technology developers. This dialogue is essential for crafting policies that not only safeguard intellectual property rights but also encourage the ethical and sustainable development of AI technologies. The outcomes of the Mímir Project are expected to have significant ramifications for both national and international contexts. For Norway, the project provides a critical evidence base that can inform national copyright policies and support the creation of a compensation scheme tailored to the digital age. Internationally, the findings contribute to the ongoing discourse on AI ethics and copyright law, offering a model that other countries can adapt and implement. In conclusion, the Mímir Project represents a pioneering effort to empirically assess the impact of copyrighted materials on generative LLMs for Norwegian languages. By bridging the gap between technology and intellectual property rights, the project aims to foster a more equitable and innovative future for AI development. The insights gained from this initiative will not only enhance our understanding of LLMs but also pave the way for more sustainable and fair practices in the use of copyrighted content in AI training.
Presentation: 30 mins, Javier de la Rosa, National Library of Norway
Jeff Williams. AI Ambassadors: how to demystify AI and encourage experimentation
AI is here and the media often emphasizes how it is boosting efficiency across the board. However, beyond using ChatGPT for writing and generative AI tools for creating images, many struggle to integrate AI effectively into their daily work. Why is adopting it so challenging? This presentation discusses ACMI’s AI Ambassadors program, a program established to enhance staff understanding of AI through practical experimentation, removing one of the barriers to adoption: understanding how AI works. Inspired by The Web3 Innovation Lab for Arts and Culture by WAC Lab (https://wac-lab.xyz/) and the learning-by-doing approach discussed in 'Not another lab – embedding innovation in organisations,' a panel at Future of Arts, Culture & Technology Symposium 2024, ACMI’s AI Ambassador program emphasizes learning through hands-on projects. Led by ACMI's Technology Team and primarily involving non-technical staff, the AI Ambassador program, launched in January 2024, has focused on demystifying machine learning, exploring generative AI fundamentals, debating ethics and usage, and providing assignments to expand on these topics through action. Building on this knowledge, AI Ambassadors are testing in-house AI tools like a RAG-based internal chatbot and a fine-tuned chatbot with collection knowledge, using these experimental products to evaluate their effectiveness and capabilities. This presentation by ACMI’s Head of Technology and AI Ambassador’s program architect, Jeff Williams, will outline the program, topics discussed, hands-on experiments attempted, lessons-learned, measures/effectiveness, and next-steps of this effort to deepen understanding of AI and encourage broader discussion of how machine learning can be responsibly utilized in ACMI's program, product, and operations. ACMI's AI Ambassador program is central to a three-tier strategy designed to enhance staff-wide AI education. The program includes: (1) AI Ambassador: Conducted fortnightly, this intensive involves regular discussions and experimental learning aimed at a broad range of staff. It serves to deepen ACMI’s overall AI comprehension and to identify new opportunities. (2) AI Lunch and Learn: Held bimonthly, these casual sessions are open to all staff and board members. They broaden the scope of discussions from the AI Ambassador sessions, enabling ambassadors to share their AI insights and further the organization’s overall AI knowledge. (3) Machine Learning Knowledge Base: This is an organization-wide, openly accessible documentation of ongoing machine learning projects. It allows all staff to explore the work being carried out by ACMI’s Creative Technologists and Technology teams. The AI Ambassadors program complements the work underway at ACMI to develop machine learning tools to enhance access to its collections. These include: Video description collection search; Audio transcription collection search; In-gallery embedding collection explorer; Text-to-speech for chatbots and museum labels; Embeddings-based collection object recommender. Delegates will leave this presentation with practical insights on how to demystify AI and foster an environment of innovative experimentation, enabling them to integrate AI more effectively into their own organizations and drive meaningful change.
Presentation: 15 mins, Jeff Williams, ACMI
Jon Dunn. Whisper applied to digitized historical audiovisual materials
Beginning in 2015, Indiana University (IU) undertook a massive project, known as the IU Media Digitization and Preservation Initiative (MDPI, https://mdpi.iu.edu) to digitize rare and unique audio, video, and film assets from across IU’s collections that resulted in over 350,000 digitized items to be preserved, including collections from the IU Libraries Moving Image Archive, Cook Music Library, Black Film Center & Archive, Archives of Traditional Music, and many other campus and university units. The MDPI digitized corpus contains significant diversity in terms of physical media formats, quality of audio and video in the original recording, content genre, language, and amount of existing metadata. From 2017-2023, with support from the Mellon Foundation, IU Libraries collaborated with AVP and other partners to develop the Audiovisual Metadata Platform (AMP, https://go.iu.edu), an experimental open source software system for building and executing workflows that combine AI/ML and human steps to help create metadata for AV resources, which was demonstrated at Fantastic Futures 2023. We are beginning to collaborate with collection managers at IU to use AMP with portions of the MDPI corpus. In the meantime, accessibility by users with disabilities has become increasingly important to libraries and universities in order to broaden access and ensure compliance with government regulations and institutional policy. With accessibility needs in mind, IU Libraries is exploring the feasibility of using the open source Whisper automated speech recognition tool from OpenAI (https://openai.com/research/whisper) in conjunction with IU’s HPE Cray EX supercomputer, known as Big Red 200 (https://kb.iu.edu/d/brcc), to generate captions or transcripts for this enormous collection of digitized materials. Testing is being performed using small, medium, and large (v2 and v3) Whisper ML models against a selection of content that samples this diversity of physical media, audio quality, and content format. Outputs produced by Whisper will be compared to ground truth for each media item, created through use of a professional human transcription service. Content to be tested includes files digitized from wax cylinders, Betacam videotapes, open reel audio tapes, audiocassettes, motion picture film, VHS, and more. Content will include both audio-only and video materials ranging from field recordings and home video to lecture capture, oral histories, educational films, student events, and broadcast radio and TV programs. Because of its distinct nature, we intend to set aside recordings that primarily feature music in order to focus on spoken word materials. In this presentation, we will demonstrate trends in scores such as word error rate and word information processed across Whisper models and types of source content, highlight interesting examples of Whisper successes and failures, share our experiences with processing files using a shared research-oriented high-performance computing environment, and discuss our future plans for using automated speech recognition output to support accessibility and discovery. We will also discuss our experiences with Whisper in comparison to commercially-available automated speech recognition models. This presentation will build on the growing body of work testing how well and in what configurations Whisper can be used effectively to create captions and transcripts for audiovisual materials, particularly for digitized historical materials.
Authors: Emily Lynema, Jon W. Dunn, Brian Wheeler, and Jon Cameron, Indiana University Libraries
Presentation: 15 mins, Jon Dunn, Indiana University Libraries
Joshua Ng. Archives in the cloud: Exploring machine learning to transform Archives New Zealand’s digital services for agencies
From February to July 2022, Archives New Zealand spearheaded an innovative proof of concept (PoC) project to explore the potential of machine learning (ML) and hyperscale cloud computing in auto-classifying digital public records and surfacing information of interest to Māori. This initiative was funded by the Digital Government Partnership Innovation Fund and aimed to address critical challenges in managing the ever-growing volume of digital information produced by public offices, such as government agencies and Crown entities. Some of these digital data hold long-term significance for Aotearoa New Zealand and will eventually be preserved as part of the nation’s archival heritage under the Public Records Act 2005. Traditionally, systems designed for handling paper records have struggled to keep pace with the digital era's data tsunami. Manual appraisal, sorting, and disposal of digital records are no longer feasible due to the sheer volume. To prevent inevitable gaps in the government’s archival memory, Archives New Zealand sought to evaluate if modern technological tools could streamline these processes. The PoC specifically aimed to:(1) Assess the feasibility of using ML tools to streamline the appraisal process by auto-classifying records according to established disposal authorities. (2) Determine whether these tools could identify and highlight material of significance to Māori communities. Collaborating with key stakeholders—including the Ministry of Justice, Ministry for Primary Industries, Microsoft, AWS, and information management experts—Archives New Zealand adopted an agile and iterative approach. The project addressed critical issues such as data residency, security, and defining key outcomes for this experimental phase. During the limited project timeframe, both Microsoft and AWS developed solutions using their advanced ML tools that demonstrated promising capabilities in auto-classifying records and detecting Māori-specific subject headings. Although the initial results were promising, further training and refinement of these models are necessary to enhance their accuracy and relevance. Future efforts will involve ongoing development and collaboration with Māori to ensure culturally appropriate and precise identification of records. The PoC underscored the vast potential of ML and cloud computing technologies to revolutionize digital record management. Moving forward, Archives New Zealand aims to build on this success by advancing auto-classification techniques under general disposal authorities. This approach is expected to significantly enhance the efficiency of information managers and agencies across the government, enabling them to handle digital records more effectively. For the continued success and scalability of such initiatives, several key considerations need to be addressed: (1) Securing appropriate resources and expertise. (2) Ongoing collaboration with Māori to ensure the cultural integrity and relevance of surfaced records. (3) Aligning processes with the Algorithm Charter for transparency and accountability. (4) Developing a cohesive all-of-government ontology to standardize disposal authorities and support large-scale implementation. In conclusion, this PoC has laid a strong foundation for integrating ML and cloud technologies into archival practices. By continuing to refine and expand these approaches, Archives New Zealand aims to address the challenges of digital information management, ensuring that the nation’s archival memory is comprehensive, accessible, and reflective of all communities, including Māori.
Presentation: 15 mins, Joshua Ng, Archives New Zealand
Kara Kennedy. AI Literacy for Librarians in Low-Resource, Multicultural Communities
Emerging Generative AI tools such as ChatGPT offer some of the greatest challenges but also some of the greatest benefits to people served by libraries housed within universities and polytechnic institutions. On the one hand, the tools can be used to shortcut learning, and students are at risk of getting in trouble for cheating. The tools can be used by teaching staff to shortcut grading, leading to potential issues with data privacy breaches and biased results. Despite so-called AI detection tools having low accuracy rates, they are still being used and are more likely to flag ESOL students. This has an outsized effect on multicultural and migrant communities, who already face less access to devices and the internet due to the digital divide. On the other hand, the tools can be used to help students overcome socio-economic and language barriers by enabling them to have on-demand access to virtual tutors and style their communication more formally. The tools can be used by teaching staff to create more engaging lessons, more accurate rubrics, and more customisable resources. To be able to take advantage of these powerful new tools safely and responsibly, though, people need to develop AI literacy, the newest part of digital and information literacy. Librarians remain trusted sources for information literacy for both staff, students, and members of the public. They assist with finding good sources, avoiding copyright violations, and referencing appropriately. Librarians are also often on the front line of assisting people with technology because they have a physical presence that is more accessible than a phone hotline or online helpdesk. But helping others with AI technology requires more knowledge and skills than showing someone how to access an electronic journal or use the printer. AI tools have a wide range of capabilities, and the effectiveness of their outputs is still heavily influenced by the prompts used. There are significant issues around biases, hallucinations, copyrights, and ethical considerations. AI does not rely on databases but predictive, mathematical models. It produces original content that has not author and no copyright. It offers a major challenge to fundamental questions about information sources and data that librarians need to be prepared for in their engagement with the public. With an aging librarian workforce, continuing lack of representation from marginalised communities, and limited professional development budgets, it becomes challenging to know what staff need to learn to best serve their increasingly diverse communities. The institutional libraries servicing Auckland have a customer base that is incredibly multicultural and multilingual, with many students and staff identifying as Māori, Pasifika, Chinese, or Indian, and having strong connections to other countries and non-English languages. They have a wide variety of socioeconomic backgrounds and technological proficiencies, and the pandemic lockdowns revealed how many lack access to devices and internet at home. For some, then, the library is their main connection to the digital world and will also become their gateway to AI tools. It is a trusted source of knowledge they will turn to in order to learn about new technology and how to use it responsibly and safely. It is clear that the role of librarian now needs to include AI literacy to enable staff to keep up with the shifts in how information is retrieved, relied upon, and sourced, and the threat that AI poses to people’s relationship with textual, audio, and visual information. This presentation will offer a vision of what AI literacy looks like for librarians and their professional colleagues working in a low-resource educational environment servicing a high number of low-socioeconomic, multicultural customers.
Lightning talk: 5 mins, Kara Kennedy, Manukau Institute of Technology
Karen Thompson. Applying AI to accelerate data transcription from digital objects
Museums and Herbaria provide data that document the diversity, distribution, and evolutionary history of life on earth. These collections — essential for cutting-edge research into biodiversity, conservation, and biogeography, and increasingly a resource for understanding the character of ‘place’ — face many of the same challenges that archives contend with. Particularly, they share the need for digitisation of their collections.
The creation of digitally accessible surrogates of collection objects enables institutions to optimise access to and value of their holdings, as expressed by the FAIR principles (ensuring data are Findable, Accessible, Interoperable, and Reusable; Wilkinson et al. 2016). The increased potential for discovery of collection resources is particularly valuable for small collections, which may receive fewer in-person research visits than larger institutions. These resources also lower the barriers — financial, institutional, academic, sociological (Hedrick et al. 2020) — to participation in research, community use of the materials, interdisciplinary involvement, and the inclusion of a diversity of voices.
In building the metadata that exists alongside these digital collections, essential to their findability, the transcription of text-based primary data on the specimen labels is a significant bottleneck. Advanced computer vision and text recognition methods using AI hold the potential to substantively reduce this hurdle.
A transdisciplinary team at the University of Melbourne has developed a tool that reduces the manual labour involved in text-based label transcription. Named ‘Hespi’ (https://rbturnbull.github.io/hespi/), it effectively identifies the location of the primary data label and of the data fields within that label on a digital image of a mounted specimen page, then applies text recognition to provide a text transcription. The team have also determined how other collections can most effectively fine-tune the AI models to specifically suit their collection needs (Thompson et al. 2023).
The potential application of this tool is not limited to herbaria: rather, it holds potential to be applied to any collection objects with labels or text-based data for which digital images are available.
Hedrick BP, Heberling JM, Meineke EK, Turner KG, Grassa CJ, Park DS, Kennedy J, Clarke JA, Cook JA, Blackburn DC, Edwards SV, Davis CC (2020) Digitization and the Future of Natural History Collections. BioScience 70 (3): 243‑251. https://doi.org/10.1093/biosci/biz163.
Thompson, K. M., Turnbull, R., Fitzgerald, E., & Birch, J. L. (2023). Identification of herbarium specimen sheet components from high-resolution images using deep learning. Ecology and Evolution, 13, e10395. https://doi.org/10.1002/ece3.10395.
Wilkinson M, Dumontier M, Aalbersberg IJ, et.al., Mons B (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3: 160018. https://doi.org/10.1038/sdata.2016.18
Authors: Robert Turnbull, Emily Fitzgerald, Karen Thompson, Jo Birch
Lightning talk: 5 mins, Karen Thompson, University of Melbourne
Kate Follington. C19th Handwriting and Model Making with Transkribus
Government archives from the 19th century have been traditionally difficult to transcribe due to the variety of cursive handwriting found within the records. The scale of records held within state collections adds an additional complexity because a digital team must consider which records aligned with an AI tool will facilitate better public access to popular records. Public Record Office Victoria has embarked on a journey of experimentation using European tool Transkribus to build different machine learning models, alongside public models, to work out which transcription approach will yield the best return on investment in time and outcome. The questions we are asking include: One hand models, or multi hand models? Can we have ten hands in one model and still see clean transcription? Should we start with indexes rather than correspondence records? How long did it take to build the model to get decent results, and what error rate can we tolerate? This presentation will share the results of this experimentation process using the Ned Kelly series along with other popular record series.
Lightning talk: 5 mins
Kate Follington, Public Record Office Victoria, State Archives Victoria
Katrina Grant. From Small Data to Big Knowledge Networks: Preparing Research Practices and Publishing Infrastructure for the AI-era
Research in the humanities has always had a data problem, or, more accurately, a data-sharing problem. While collection data has opened up and is now linking up, research data often remains hidden behind prose, with outputs still heavily skewed towards monographs, journals and other traditional forms of publishing. These final outcomes are also often behind paywalls and are challenging to access for those working in the broader GLAM sector, let alone various publics. This has created a disconnect between the rich data being collected and created by researchers (photos, video, notes, observations, maps, models etc) and what can actually be accessed, reused, aggregated and linked. This problem is set to become more acute as tools and methods that embrace big data, such as AI and machine learning, begin to inform curatorial practice and the broader cultural sector. As researchers based within an institute dedicated to producing, publishing and sharing research on art history and visual culture, we have realised that much of the rich, verifiable materials and nuanced information on collections and visual culture is not ready to be used in data sets for things like an LLM or a text-to-image model. As the sector prepares to embrace the potential for specialised AI and ML tools and methods, there is a pressing need to connect the complex ideas and diverse data being produced by researchers in disciplines such as art history, anthropology, media studies, with our cultural collections. Further, as the sector begins to critique models being produced by large commercial organisations, the many small, slowly accrued datasets produced by researchers, and transparent frameworks around them, might begin to provide alternative training sets to those scraped from internet feeds. One way to address this challenge and opportunity is to change the way we practice and publish academic research on collections and cultural heritage. As part of a new multi-year project–the Visual Understanding Initiative–a group of publishers, art historians and digital humanists at the Power Institute for Art and Visual Culture at the University of Sydney are developing new approaches to research that allow scholars, artists, community knowledge holders, and museum partners to produce rich data sets, annotated objects and data visualisations, and to communicate them beyond the codex as part of an open publishing infrastructure based on interconnected nodes. Although this project is at its inception, we hope it will allow humanities researchers and the cultural sector to develop practices for data collection and interoperability that will make it easier to connect with future potential for the development of tools and methods using AI. And that, in the tradition of critical humanities thinking, it will also have the capacity to encourage us collectively explore the possibilities and critique the limitations of AI in ways our sector does best: through transparent, critical engagements with a vast range of cultural materials. This paper will highlight key projects from the Visual Understanding Initiative’s three main areas of focus: Critical Looking–How digital infrastructure can encourage slow looking and support small data set development with the Image Annotation Workbench; Indigenous Ways of Seeing–How the Digital Keeping Places project is working on data sovereignty and cultural safety; Visualising Ideas–How Power Publications’ Drupal-based platform will encourage more fluid connections between generative research practices and the publishing of research, ideas and data.
Presentation: 15 mins, Katrina Grant & Marni Williams, Power Institute, The University of Sydney
Laura McGuiness. Just One More Access Point: LLM Assistance with Authority Control
For decades, librarians and archivists have created authority records to disambiguate author names. Frequently created using Machine Readable Cataloging (MARC21), authority records are highly structured data that authorize a preferred name format and then cross-reference variant or related names to the correct author. These authority records ensure standardized access points for users, creating a linking framework for related or identical names in a database. Los Alamos National Laboratory (LANL) is a United States national laboratory responsible for ensuring a safe, secure, and effective nuclear stockpile. LANL Librarians have been creating and maintaining authority records manually within an Integrated Library System (ILS). While many records are linked to LCNAMES, the Library of Congress name authority vocabulary, most name authorities were manually created. Like many other manual processes, however, this did not scale well and was time consuming. LANL’s National Security Research Center (NSRC) is the Lab’s classified library that houses tens of millions of materials related to the development, testing, and production of nuclear weapons. With the NSRC’s recent digitization of legacy archival materials dating back to the Manhattan Project, additional authority control has become a necessity. Despite the work of past librarians and the modern use of persistent identifiers such as ORCiDs, older publications (including born-analog and newly digitized publications) lack any mechanism of name disambiguation. Name ambiguity decreases the number of standardized access points for a resource, reducing the reliability of a database's search results. Because a researcher’s work at LANL is often critical to ensuring national security, it is imperative that all possible access points are available for use. Accurate authority records are especially important for traditionally underrepresented groups who have been disproportionately affected by name ambiguity in databases. This includes individuals who are more likely to change their name over time, names that have been transliterated from a foreign language into English or remain in non-Latin script, as well as double-barreled surnames (e.g. Maria Goeppert Mayer). Name disambiguation contributes to reliable bibliometrics and helps ensure scientific discoveries are correctly attributed. In a field where gender and racial minorities often experience lower authorship attributions, authority control gives recognition to scientists whose contributions have been overlooked in scientific databases. While the benefits to name authority records are many, there are multiple ethical concerns to consider in this project surrounding both AI use and authority control. For the former, we are mindful of ensuring transparency of the training set, outputs, and decision-making processes. Our team will be documenting the criteria chosen for inclusion or exclusion of certain data sets to provide a clearer understanding of the LLM’s learning processes. This includes ethical concerns with the training data, such as the incorporation of Resource Description & Access (RDA) Rule 9.7 or the MARC21 375 field, which require assigning gender to authority records. Attributed by scholars responsible for “outing” gender identities through authority records, this is one piece of training data that we may consider excluding. While LANL librarians did not abide by these cataloging rules, the possible use of external authority records may introduce this concept. To mitigate these concerns, our project team will be establishing internal accountability measures that will assess potential impact and biases. Currently, LANL has several large institutional databases lacking name authority records. This lack of access control contributes to information noise, producing overwhelming and ambiguous search results. Recently, we have begun investigating various Machine Learning (ML) approaches to support the creation of valid name authority records. We are utilizing Large Language Models (LLMs) to create accurate LANL name authority records. We will begin with the existing 17,033 records currently in LANL’s ILS, Alma, to train Llama2 on the proper format of a record and what components it must contain. We will then use an existing local database to generate the authority records. Once the LLM is able to properly produce a record, we will feed it the contents of the local database and evaluate the generated authority records. Author disambiguation is inherent in the process of authority control. Names are generally not unique, frequently change over a person’s lifetime, and are prone to multiple permutations in spelling and format. When using an LLM, there are key pieces of information in a database that would aid in matching, including: the author’s publication titles, dates active, and recurring co-authors. A combination of string matching via metadata filtering and vector or graph searching will aid in disambiguation by clustering linking features such as semantically related phrases or other document elements. The project team is also considering templatizing and populating a prompt that will ensure consistency in the generated response’s structure and semantics. The long-term goal of this project is to implement automatic authority control within LANL databases. As additional legacy materials are digitized and cataloged, the LLM will continuously locate new author names and either create an authority record or map to an existing one. These authority records will then be implemented in a semantic platform that will assist our cataloging and search processes across multiple databases. Ultimately, this will both improve cataloging efficiency and ensure findability of data critical to nuclear deterrence and national security.
Lightning talk: 5 mins, Laura McGuiness, Los Alamos National Laboratory
Lauren Booker. Responsibility for the care and protection of Indigenous Knowledges in AI
This panel will introduce the 2024 University of Glasgow led project “iREAL: Inclusive Requirements Elicitation for AI in Libraries to support respectful management of Indigenous knowledges”. As a scoping project, iREAL aims to develop a model for responsible Artificial Intelligence (AI) systems development in libraries seeking to include knowledge from Indigenous communities, specifically drawing on experiences of Aboriginal and Torres Strait Islander communities in Australia. The project addresses the issue that libraries and archives, nationally and internationally, hold collections from many Indigenous communities, with varied approaches to collection access, engagement, control and governance. These different approaches need to be understood and addressed by collecting institutions, researchers, AI developers and GLAM professionals to understand, develop and implement appropriate requirements for the responsible care and protection of Indigenous knowledges and data sources from Indigenous communities. Contributors from the University of Technology Sydney and the Digital Preservation Coalition will discuss methods and approaches being utilised in the project to consider the interventions required to support the protection of Indigenous rights and care of materials and their application in AI. Speakers will share insights on the importance of centering care using Indigenous research methodologies and methods, principles of the Right to Know and the Right of Reply, and assertions of Indigenous rights being expressed in the Indigenous Data Sovereignty movement. The speakers will reflect on the importance of Indigenous people being stewards of data sources held in libraries, archives and museums and discuss the critical need for Indigenous data stewardship or approaches to be embedded in digital preservation frameworks and disseminated both within and external to the GLAM sector globally. Overall, the panel will guide a discussion on ethical considerations needed to address bias in data held in libraries, archives and museums; the need for Indigenous participation in data governance; and the importance of wide sector dialogue in shaping the deployment of Indigenous data within AI systems.
Panel: 45 mins, Lauren Booker, Kirsten Thorpe, Jason De Santolo, UTS & Robin Wright DPC, panel facilitator A/Prof Kirsten Thorpe (UTS)
Lizabeth Johnson. Enhanced stewardship and data sovereignty through the implementation of an ontology-enhanced large language model (LLM)
Many archivists have experienced the conundrum of how to manage collections and make them discoverable to users. It is far from uncommon for an archive to hold collections that have never been properly indexed because of the lack of staff or resources, whether physical or computational. A further issue for some archives is the existence of legacy software systems that were used to catalog collections, but which are no longer supported and cannot communicate with more modern software. Now add in the complicating factor that users rely on the documents in your collection to be discoverable for the purpose of meeting national security needs. This is a nested set of problems that Los Alamos National Laboratory’s (LANL) National Security Research Center (NSRC) is attempting to address by adapting a large language model (LLM) that can be used to make documents discoverable to staff who need them for their research. The NSRC’s archive holds multiple collections dating from 1943 to the present, and the documents, photos, and films held by the archive number in the tens of millions. However, only a fraction of these items is easily accessible to LANL staff, while the rest are in physical collections or are only known to archival and library staff who have access to the antiquated software systems that continue to house the items. This creates a roadblock in terms of the NSRC’s stewardship of its materials because, while it maintains them, it cannot make them easily discoverable to researchers. In some cases, the byzantine nature of the collections can frustrate even the most dedicated librarian. This difficulty regarding stewardship has also led to concerns over data sovereignty at LANL. To date, several divisions have chosen to maintain their own collections, separate from the NSRC, because of the issue of accessibility and management of that material. Essentially, scientists in those divisions want to ensure that they will be able to continue to have access to their data without fearing that it might become hard to find in the labyrinthine NSRC physical and digital collection space. This has created situations where researchers have reached out to the NSRC for documents that exist outside of the NSRC and are therefore unavailable to those researchers unless they have a personal connection to the author(s) of the documents in question. Within a scientific setting, this can have the unfortunate effect of forcing researchers to repeat work that has already been done, particularly in situations where the work was unsuccessful and the researchers would have benefitted from that knowledge in order to modify their own approach or simply not pursue that approach in the first place. While some progress has been made in terms of the NSRC’s outreach to these other divisions in an effort to encourage them to make their data more accessible, the problem of discoverability remains. In order to address these issues with stewardship and data sovereignty, members of the NSRC’s staff, along with staff from other divisions at LANL, are working on a project to adapt an LLM to make the NSRC’s collections more accessible. We plan to train the LLM on our collections and enhance its accuracy through the use of an ontology and retrieval augmented generation (RAG). Ontologies are an important tool used with LLMs to create a concrete vocabulary and clear relationships between entities, and because of the specialized nature of the collections in the NSRC, our ontology will be developed in-house. When integrated into a RAG approach, the ontology will enhance the accuracy of the LLM and reduce hallucinations, which are a significant concern for any organization attempting to adapt generative AI but even more significant within a national security setting. Once we have such a system in place, it is our hope that this will help to create a cultural shift in institutional practice, such that other divisions that have maintained their own collections because of concerns over data sovereignty and discoverability will add their data to the LLM as well. That, in turn, will make the material accessible to other researchers who need it for their work. Our goal is to have a working prototype of this system in place within three years so that LANL staff can more effectively meet the national security needs of the United States. For much of its history, LANL’s staff has been faced with the task of generating cutting edge scientific and engineering knowledge. However, stewardship of that knowledge and data sovereignty were afterthoughts. Now, the NSRC has the task of undertaking stewardship of decades of knowledge and making it discoverable as well as preserving data sovereignty. My presentation will focus on how this LLM tool will help the NSRC meet this longstanding challenge.
Lightning talk: 5 mins, Lizabeth Johnson, Los Alamos National Laboratory
Mia Ridge. Closing the loop: integrating enriched metadata into collections platforms
The AI4LAM community calls and annual Fantastic Futures conferences present a rich picture of the many ways that people are using AI and machine learning to enhance collections records. But what happens to the data generated by these projects? Ingesting enhanced data appropriately and responsibly into collections management and discovery platforms (hereafter 'CMS') is a key factor in operationalising AI to improve collections documentation and discovery - but it's often difficult in practice. We present original research into the extent to which projects can integrate enriched data into collections platforms. We report on the factors in the success or failure of data integration shared by over 60 projects. Our research is inspired by the struggle many crowdsourcing and participatory projects have in integrating data created or enriched by online volunteers into CMS (Ridge, Ferriter, and Blickhan 2023). Early signs indicate that many projects using machine learning / AI tools to enhance or create data will face the same issues. For example, Anna Kasprzik (2023) described issues transferring applied research in automated subject indexing to productive services, and Matthew Krc and Anna Oates Schlaack (2023) shared the complexities of building and sustaining an automated workflow within a library. Enriched metadata is 'data about a cultural heritage object ... that augment, contextualise or rectify the authoritative data made available by cultural heritage institutions' (Europeana Initiative 2023). Enrichments include identifying and linking entities to vocabularies, creating or correcting transcriptions or metadata, and annotating items with additional information. Ideally, the origin of these enrichments would be visible to internal and external users of CMS (and even more ideally, crowdsourcing volunteers would be credited for their contributions). In early 2023 we designed a survey to understand the barriers and successes for projects incorporating enriched data into CMS, and the types of data, tools and processes used. Our survey was available at https://forms.gle/JgArpbL6VNM6W3Vk9 from March 20 to April 30 (3 responses came in after the official cut-off date but are included in our analysis). We designed the survey to be as inclusive as possible, encouraging past project members and those with long-completed or in-progress projects to respond, and allowing more than one person to respond per project (to allow for the partial nature of information sharing in large or complex projects). After piloting with previous collaborators and in response to early feedback, our survey invitation said 'We're interested in the experiences of anyone who's worked on crowdsourcing or machine learning projects to enrich collections data. … The survey is particularly designed for people working in collecting institutions (libraries, archives, museums, etc) with their own catalogues, but we also welcome responses from projects that create or enrich data through e.g. research or community projects working with data from GLAMs, or 'roundtripping' records to return enhanced data to a catalogue'. As the survey closed not long before this Call for Proposals, we are still analysing the results, but we are able to share some early data below. Our analysis will be complete before the FF24 conference. Overall, we received 63 responses. 20 responses were from the UK, 13 from the USA, 4 from multiple European countries, 3 each from China, Sweden and the Netherlands. 2 or fewer responses were received from Australia, Belgium, Denmark, Finland, France, Hong Kong, Ireland, Italy, Spain and Switzerland. Mindful of FF24's multicultural communities lens, we devoted much of our outreach effort to reach potential participants outside the usual Anglo-American and English-language circles. However, we are conscious that we did not receive any responses from Latin America, South Asia or Africa. We hope that a future, funded project would enable collaboration with groups in those areas, with a survey translated and localised for each region to encourage wider participation (and we would be particularly interested in discussing this with conference participants). Approximately half the projects are in languages other than English, but all responses were provided in English. 43% of responses were about crowdsourcing projects; 24% about machine learning / AI projects, and 22% were about projects that combined crowdsourcing and machine learning. The majority of responses were from libraries (35%), followed by museums and archives (both 9.5%); other projects were based in universities, non-profit organisations and combined services. More large organisations responded than small ones - 33% of responses came from organisations with more than 500 paid employees; 30% with 100 - 499 employees; 18% with 5 - 19 employees and 11% with 1 - 4 employees. The majority of respondents were able to ingest enriched data to some extent - 20% could ingest both new and updated records; 8% could only ingest new records, another 8% were partially able to ingest records (for a range of reasons) and 6% could only ingest updated records. 22% of respondents were not able to ingest enriched data, and 21% were still planning or hoping to complete ingest. Barriers to ingesting enriched data include lack of technical skills, the restrictions of formats such as MARC, and the inability to ingest 'third party' metadata or transcriptions. Issues reported included gatekeeping and institutional politics, lack of staff time (e.g. for technical processes, quality control and data cleaning), erroneous or incomplete data not meeting required standards, and data replication problems. Responses about factors important for successful ingest often began with the word 'agreement', including collaboration between departments, organisations, and with volunteers, on topics such as conditions for data re-use, specifications and standards, and the distribution of work between teams. Initial analysis suggests that the use of APIs, standard formats, data standards and controlled vocabularies contribute to success by reducing the overhead of creating a pipeline of import/exports across platforms and tools. At least 63% of projects had some manual or automated quality assurance processes for enriched data. 13% had no process and some projects are still deciding on their processes. 87% respondents provided more information on what 'data quality' meant for their project; analysis is ongoing. 29% of projects using machine learning (17 respondents) reported that corrections to the data help improve the model. The ability of systems and workflows to display information on crowdsourcing- or machine learning-enhanced records to staff or the public was very mixed; analysis is ongoing. When we began this research, we thought that the affordances of the CMS used by GLAMs would be a significant factor in the successful integration of enriched data into these CMS. However, our results so far indicate that skills, resources, and inter-personal and institutional relationships are also significant. Bibliography: Europeana Initiative. 2023. ‘Enrichments Policy for the Common European Data Space for Cultural Heritage’. Europeana Initiative. https://pro.europeana.eu/post/enrichments-policy-for-the-common-european-data-space-for-cultural-heritage. Kasprzik, Anna. 2023. ‘Automating Subject Indexing at ZBW: Making Research Results Stick in Practice’. LIBER Quarterly: The Journal of the Association of European Research Libraries 33 (1). https://doi.org/10.53377/lq.13579. Krc, Matthew, and Anna Oates Schlaack. 2023. ‘Pipeline or Pipe Dream: Building a Scaled Automated Metadata Creation and Ingest Workflow Using Web Scraping Tools’. The Code4Lib Journal, no. 58 (December). https://journal.code4lib.org/articles/17932. Ridge, Mia, Meghan Ferriter, and Samantha Blickhan. 2023. ‘Recommendations, Challenges and Opportunities for the Future of Crowdsourcing in Cultural Heritage: A White.
Presentation: 15 mins, Mia Ridge, British Library
Miguel Escobar Varela. AI-Assisted Analysis of Malay-Language Periodicals in Singapore
This presentation will offer an overview of the early stages of a collaborative project that brings historians, digital humanists and computer scientists together to explore the historical significance and cultural connections of Malay-language periodicals published in Singapore from the late 19th to mid-20th century. The study will utilize a unique methodology that combines optical character recognition (OCR) and large language models (LLMs) to convert newspapers and magazines written in Jawi (the Arabic script adapted to Malay phonemes) to Rumi (Malay in Roman characters), enabling the processing and analysis of a vast corpus of historical periodicals at an unprecedented scale. The project's innovative approach involves the use of AI techniques for transcription, segmentation, and document-flow identification, allowing for the efficient and accurate processing of the historical materials. By leveraging LLM models fine-tuned on historical Malay-language corpora, the research team aims to develop a range of specialized models for specific tasks, such as narrative modelling, nuanced named entity recognition (NER), and theme identification. These smaller, task-specific models are expected to outperform state-of-the-art commercial solutions in terms of accuracy when dealing with culturally and historically specific content, while also having a reduced carbon footprint due to their lower computational resource requirements. The AI-assisted analysis of the Malay-language periodicals will facilitate comprehensive research on a wide array of topics, including natural disasters, economic history, and the study of cultural performances as documented in these historical sources. By examining these themes through a comparative lens, the project seeks to shed light on the interconnectedness of Singapore's history with the broader Malay-speaking world during the late colonial period. The project's significance lies in its potential to unlock valuable information from historical sources that have previously been inaccessible or difficult to analyse due to the sheer volume of the material, and declining levels of familiarity with the Jawi script amongst Malay speakers in Singapore. By harnessing cutting-edge AI technologies, the research team aims to contribute to a deeper understanding of Singapore's rich cultural heritage and its place within the region during a pivotal period in its history. Moreover, the project serves as a demonstration of the potential for AI-assisted analysis in the field of historical research, particularly when dealing with non-Western languages and scripts. The methodology developed through this study could be adapted and applied to other historical contexts and languages, opening up new avenues for scholarly inquiry and cross-cultural understanding. We seek to also contribute to the burgeoning open source community currently experimenting with best practices for fine-tuning LLMs for important analytical tasks in the humanities. This research project represents a significant step forward in the study of Singapore's history and its cultural connections to the Malay-speaking world. By combining innovative AI technologies with a comparative perspective, the project aims to provide new insights into the historical significance of Malay-language periodicals and their role in shaping Singapore's cultural heritage. The project's findings are expected to contribute to a more comprehensive and nuanced understanding of Singapore's past, while also showcasing the potential of AI-assisted analysis in unlocking the wealth of information contained within historical sources.
Presentation: 15 mins, Miguel Escobar Varela, National University of Singapore
Morgan Strong. Converging AI to access digital content in Art Museums
The Queensland Art Gallery | Gallery of Modern Art (QAGOMA) has embarked on an ambitious project: leveraging sophisticated AI mechanisms to make our digital content easily accessible on personal devices. Our technology, entirely built on open-source and freely available resources, employs computer vision to recognise artworks with high accuracy (>98%) and rapidity (averaging 200ms) across thousands of pieces. This negates the need for markers such as QR codes, label keys, or other intricate technological tools. In this presentation, we'll guide attendees through our holistic journey, from the technological foundation to the pilot phase, taking insights from lessons learned and the incremental museum-wide rollout. Central to our Web App's functionality is its AI computer vision recognition feature. By pointing a mobile device at an artwork—be it 2D or 3D—the artwork is identified within a few hundred of milliseconds to return related digital content, such as descriptions, colour and shape analyses, interactive components, and interactive questions. We'll delve into the inception, in-house development, and museum-specific customisations of this tool. Key Presentation Components: 1. The Underlying AI and Technology: At the core of the innovation is a robust AI inference model. We'll explore what we selected, and trade-offs for the models we used and do so in non-technical terms. 2. Addressing Onboarding Challenges: We'll explore the strategies we employed to get visitors using the tech and onboarding to the app. 3. Serverless and Cost-Efficient Operations on AWS: We will take a high-level walkthrough of how we optimised our backend processes, running the PWA (progressive web app) as a serverless on application AWS, keeping operational costs very low. 4. Balancing Model Training with Data Volume: Quality versus quantity – we'll discuss our approach to train our inference models, highlighting the balance we struck between data volume for accuracy, and the time required to train our model whenever there is a rotation or exhibition fit-out. 5. Synchronising with Back-of-House Processes: We’ve taken measures to ensure our platform complements and works in tandem with our existing back-of-house processes. We’re now at a point where we can train and add a new exhibition from CMS export to live AI recognition in only a few hours. Additionally we have built the process so we are not double entering data, and how digitisation and back of house operations continually improves the product offering. 6. Reflections on the rollout: We ran the app as a live pilot from April to August 2023, in a very controlled way, gathering invaluable firsthand feedback and insights. Since the pilot we have rolling it out into various Galleries, and learning what works, and what has not gone to plan. I'll share the highlights, challenges, failures and success stories and how this has pivoted the rollout. 7. The UX and Design: the decisions we made for the design and functionality of the app. 8. Live demonstration! We also record some usage in the Gallery to showcase how it works in a live environment.
Presentation: 15 mins, Morgan Strong, QAGOMA
Neil Fitzgerald. Digitisation was only the start
For around the last thirty years the British Library has digitised its collections. Historically this work was undertaken not from core grant-in-aid funding, but via external sources including research, philanthropic, and commercial. Starting with boutique digitisation of its Treasures, unique manuscripts, often visually impressive, with the end goal presentation online. Encouraged by the prevailing government ethos of the day, the Library’s 2001 New Strategic Directions plan included an enabling strategy of, ‘Using partnerships to achieve more by working together than they could do on their own,’ which drove digitisation of materials for other purposes, including to service the growing commercial family history market. Later, public/private partnerships drove mass digitisation initiatives, such as the Microsoft Digitisation Project , whose output underpinned much of the early BL Labs work. What lessons learned can we apply to the current AI/machine learning situation? Institutional adoption of new technologies and methods can be disjointed. There are many reasons for this, including required organisational and practise change, which in the UK has been compounded by the tightening fiscal environment over many years. Even when evidence for action has been justified and accepted, implementation requires sustained effort. For fields such as AI/ML that are going through such rapid change, this can create higher perceived organisational risks that need to be managed in an appropriate way. We can draw on other relevant organisational experiences where we have built up knowledge over time to inform approach. The British Library’s Digital Research Team was established in 2012 in response to a strategic review of the Library’s then Scholarship & Collections Department, with one of the key aims to better support digital scholarship. The British Library’s collections and scholarship are international. We would not be able to operate effectively without our global connections across academia, other GLAM organisations, and participation in consortia and communities of interest, across a range of mutually beneficial shared interests, and with users. One example of this is in the field of automatic transcriptions, initially through optical character recognition (OCR) and more recently handwritten text recognition (HTR). The Library’s interest in this area started following two large digitisation newspaper projects, funded by JISC , undertaken in the early to mid-2000s. These showed the importance of usable OCR for text and data mining research methods, and led directly to the pan-European IMPACT Project , who’s aim was to make digitisation, ‘Better, Faster & Cheaper’, especially for the type of historical materials found in memory institutions , and also demonstrate that the approaches developed had the potential to apply to other less supported scripts/languages. We have continued to engage with different OCR/HTR initiatives, especially for low resource languages present in the Library’s collections. One strategy to accomplish this has been OCR/HTR competitions that invite researchers to tackle the challenge of generating accurate and scalable transcription of historical texts. As a collection holding and Independent Research Organisation , we are interested in both advancing the state of the art, but also translating this research into operational and production workflows. I will explore potential approaches available to overcome barriers to applying state of the art research, including collaboration via open frameworks and mutuality - an alternative path to reliance on purely commercial options that enables everyone to contribute to, and benefit from, the value chain. Consideration will be given to the implications of these approaches and the necessity for collection holding institutions to play an active role in shaping AI/ML across this landscape for wider benefit.
Presentation: 30 mins, Neil Fitzgerald, The British Library
Peter Broadwell. Applying Advances in Person Detection and Action Recognition AI to Enhance Indexing and Discovery of Video Cultural Heritage Collections
The deep-learning neural analogue of the transformer has driven a recent technological sea change, heightening the potential of AI to enhance digitized cultural heritage collections and in particular to augment the description and indexing of video recordings. The generative applications of this technology in large commercial models (e.g., Stable Diffusion for images, OpenAI’s Sora for video) and their attendant controversies may draw the most attention, but such a focus risks overlooking the demonstrated aptitude of such models’ underlying analytical substrates to facilitate tasks of more immediate benefit to libraries, archives and museums. Specifically, the ability to recognize and digitally encode human movements and actions, both in individual and interpersonal contexts, holds great potential for elevating the accessibility and understanding of digitized audiovisual materials. This presentation will detail efforts to apply new transformer AI-based video analysis models to archival recordings of theater performances, political speeches and other events, exemplifying the methods’ potential relevance to any video collection containing recordings of human behaviors and actions. Additionally, this exploration will detail how the conceptual foundations of such neural models, considered in tandem with the cultural and ethical implications of their use, may further enrich not only librarians’ and archivists’ understanding of these tools but also their own approaches to audiovisual materials. Our initial explorations have been conducted as part of the MIME (Machine Intelligence for Motion Exegesis) project, a collaboration between faculty of the Department of Theater and Performance Studies and developers at the Stanford Libraries’ Center for Interdisciplinary Digital Research. The project has built a modular web platform with an exploratory interface to visualize the number and quality of poses and actions detected via the models described below across numerous recordings from library holdings; we also use this data to populate pose and action search engines employing high-performance vector similarity indices. Cross-referencing the pose and action data outputs with detected shots, face and clothing features also has proven helpful in extracting reliable individual-level data from “in-the-wild” recordings. To generate the fundamental pose and action data points from video recordings, the project has adopted a set of cutting-edge models and tools developed by video AI researchers at the University of California, Berkeley. We propose to discuss the following capabilities and potential insights these tools offer: Modeling of human motion tracks in 3D space and time: this component employs spatial-temporal transformer models that enact a deliberate decision to focus on the trajectories of individual human actors that have been “lifted” from 2D footage into 3D, rather than using the background scene through which multiple actors move as the primary frame of reference [1]. This emphasis on the actors in the scene, though fraught with concerns about surveillance and subject consent, provides unprecedented fidelity and accuracy in its computational representation of human actions, an affordance upon which the other tools can build [2]; Automatic fine-tuning of the video encoding models: the transformer models that enable the human detection and pose tracking features mentioned above, Video Masked Auto-Encoders (VideoMAE), have an ability to “self-train.” Simply grouping a set of videos as similar according to some high-level classification, e.g., “wedding ceremonies,” enables the customization of an underlying model that performs better at detecting people, motions and actions in other wedding videos, albeit at the expense of the model becoming less adept at scrutinizing other types of recordings [3]. This raises the tantalizing possibility of creating video understanding models that are fine-tuned for specific collections; Modeling of the physical environment as inhabited by detected human actors: the ability of the transformer models to estimate with sufficient precision the relative locations of the camera, the people, and other objects and buildings in a scene based on only two-dimensional video footage facilitates the virtual reconstruction of the physical environment in which the scene was recorded. The models in question, which build upon machine learning-based photogrammetry techniques for environment modeling such as Neural Radiance Fields, have not completely solved this challenging problem, but even the partial reconstruction of a setting that may no longer exist can be of great value to history and cultural studies [4]; Computational modeling of actions by individual or multiple interacting persons in a scene: a neural transformer model able to track individual persons also can use their estimated poses over time to infer specific gestures and actions [5]. The taxonomies developed to label such actions tend to be limited; the AVA (Atomic Visual Action) dataset describes the primary action classes in many theatrical scenes as simply “standing”, “talking”, or “looking” [6]. Yet the underlying numerical features generated by the model for each person in every frame may be indexed to enable high-fidelity searching across one or multiple videos for quite specific action descriptions, e.g., “scenes in which a person reads aloud from a book or other item they are holding.” Researchers may also find it enlightening to contemplate the underlying “slow-fast” constitution of the most successful class of such models, which achieves fidelity of recognition by considering both short (sub-second, high frame rate) and long (multi-second, low frame rate) time scales, encapsulating the conceptually and temporally “fuzzy” nature of gesture and action [7]. As mentioned above, the promise of the methods just described to illuminate existing collections of cultural heritage materials must be balanced against their potential for invasive and exploitative scrutiny of the people and expressions recorded. However, the growing ease of using such tools with discretion “in house,” as an alternative to exporting archival materials to third-party commercial interests, can alleviate such concerns somewhat, and points the way towards a careful and equitable application of these compelling technologies. Works Cited: [1] Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik. Tracking People by Predicting 3D Appearance, Location & Pose. CoRR abs/2112.04477 (2021). [2] Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. Humans in 4D: Reconstructing and tracking humans with transformers. In Proc. International Conference on Computer Vision, 2023.[3] Zhan Tong, Yibing Song, Jue Wang, Limin Wang: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training. https://arxiv.org/abs/2203.12602 (2022). [4] Junwei Lv, Jiayi Guo, Yueting Zhang, Xin Zhao, and Bin Lei. Neural Radiance Fields for High-Resolution Remote Sensing Novel View Synthesis. Remote Sensing 15(16), 2023.[5] Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph Feichtenhofer, Jitendra Malik. On the Benefits of 3D Pose and Tracking for Human Action Recognition. https://arxiv.org/abs/2304.01199 (2023). [6] Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. In Proc. Conference on Computer Vision and Pattern Recognition, 2018. [7] Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. Slowfast networks for video recognition. In Proc. International Conference on Computer Vision ICCV, 2019.
Presentation: 15 mins, Peter Broadwell, Stanford University Libraries
Peter Leonard. Beyond ChatGPT: Transformers Models for Collections
Advances in generative AI have occurred so quickly that their ultimate impact on GLAM institutions remains hard to ascertain. Nevertheless, Transformers-based neural networks have already made an indelible impact on both popular and academic conversations. Cultural heritage work stands to benefit from the new possibilities engendered by the implications of the mathematical concept of “self-attention” inherent in these models. Transformers-based models have shown that many complicated problems in culture beyond digital text — a human voice, a handwritten word, even a scene from a motion picture –– are now tractable to computation. Crucially, these developments are happening at the same moment that the “conversational interface” of ChatGPT has occasioned the most fundamental re-conceptualization of human-computer interaction since Douglas Englebart’s mouse. The implications for memory institutions are thus dual: that more and more complex human culture will be subject to computation, and that chat-based interfaces may prove to be too popular to ignore as ways of interacting with this cultural material. The first of these may seem to be an unalloyed good. Transformers hold great potential towards creating description or transcription for accessibility, helping libraries meet their legal and ethical requirements for online collections. They may also provide baseline description for backlogs of uncataloged digital images and media. One of the most recent developments is the availability of "multi-modal" large language models, which are capable of processing images and motion pictures. Importantly, all of this work can be done locally, with open-source networks running on hardware disconnected from the infrastructure (and business plans) of commercial companies. This talk will examine work with several of these non-commercial networks, drawn from experience on Stanford’s collections, including digitized photographic negatives, manuscript collections from the 19th century, oral histories of minoritized campus populations, and other archives. The second of these implications is more complicated. All of these networks, and their various representations of culture through mechanisms such as embeddings, are potential sites of archival engagement through chat. Although many are skeptical of the precision, determinism, and honesty of “instructable” large language models, GLAM institutions should nevertheless experiment and discover what conversational modalities can offer their patrons. Enabling researchers to query a collection or “chat with an archive” can provide a compelling and familiar entry point for massive digital collections, whether for novice users or scholars with knowledge of the data. This talk will show some preliminary work on “chatting with archives” in the domain of Silicon Valley history and 1970s performance art videos.
Presentation: 30 mins
Peter Leonard & Lindsay King, Stanford University Libraries
Rachel Senese Myers. Adopting Whisper: Creating a front end optimized for processing needs
Georgia State University Library has a rich collection of audio-visual material within its Special Collections and Archives department. Currently, patrons can browse over 3,000 audio-visual assets within our Digital Collections platform. Recent additions to that number have included two projects that digitized over 1,000 items within the last two years. The Library also has a robust oral history program which generates roughly 100 interviews every year with a current backlog of over 500 for processing (many on hold due to transcripts needed). These audio-visual assets are sourced from the Library’s Special Collections and Archives department whose collecting areas include: music, popular culture, labor and unions, women’s history, gender and sexuality, Atlanta neighborhood and civil history, Atlanta photographers and photographs, rare books as well as the University’s history. Many of the recordings document the racial, political and civil rights tensions within the Labor community in the Southeast; the history and struggles of women in the South and new waves of feminism; and the stories of the LGBTQIA in the south and the social and political opposition they faced both in Atlanta and the wider Southeast, United States; and the politics in government surrounding the civil rights movement, further desegregation in the Southern United States, and Atlanta’s growth.
In order to facilitate better accessibility to our current and future audio-visual assets, the Library has started a three phased project to identify, create and expand tools for fast AI generated transcriptions while balancing responsible implementation of AI. Our end goal is to efficiently reprocess recordings currently online, facilitate faster processing of our backlog, and provide a tool to create transcriptions during accessioning while balancing the sensitive nature of oral history. We cannot ethically upload AI-generated transcripts without having a way to conduct quality control and edit any errors that could result in the misinterpretation of someone’s words. The team, consisting of select members from the Digital Library Services department, identified Faster Whisper as the model of choice after consulting the results presented within the AI4LAM community, tests with sister institutions and further research. We also identified the need for a front-end interface due to limitations within our institutional policies and infrastructure. We stood up the available gradio front end for Faster Whisper further in-house testing. The results showed that Faster Whisper met our expectations but the tested user interface did not. For the second phase, the Team identified what elements would be necessary and desired for production use. Our goal is to identify pre-existing tools or create a front-end interface that will allow for editing transcripts during quality control, regeneration of transcripts with the provided edits, as well as experiment with named entity recognition and text summarization for first pass metadata creation. Other usability functions which were identified as lacking in the current front end will also be included. The user group for phase 2 is the Digital Projects Coordinator and the staff and students within the Digital Projects Unit. Phase 2 is currently being worked on and we anticipate testing to started in late Summer 2024. Phase three will incorporate feedback from phase 2 testing and expansion of the user group to those within the Special Collections and Archives department (and other units within the Library if there is demand). Anticipated needs for expanding the framework include creating segregated processing spaces for projects or curatorial areas within Special Collections and Archives to address privacy concerns for processing restricted material. Other needs will be identified with a representative from Special Collections and Archives. This session will outline the evaluation with needs and wants identified, the product as created for Phase 2, and our outlined goals for updating and expanding the user base for the interface. The Project Team also plan to share our code publicly for others to implement and adapt as needed.
Lightning talk: 5 mins, Rachel Senese Myers, Georgia State University
Rasa Bočytė. Against Al Narratives
It is too easy to forget the pervasive impact of the language, visuals and metaphors that we use to describe our relation to AI. Mainstream narratives are often oversimplications of AI's capabilities and potential impact and depict AI’s “transformative potential” and “inevitable adoption”. They use imagery that exclusively depict white collar jobs, anthropomorphised robots and context-deprived scenes reminiscent of science fiction films [1]. Not only does this distort the public's understanding of AI capabilities which often results in bewilderment and fear, it also pushes AI developers and researchers towards imagining a specific future scenario that they should be building. In fact, big tech companies have harnessed such narratives to justify large-scale investments in AI capabilities over addressing much more pressing issues concerns related to social justice, (mis)representation and user agency [2]. By using these narratives and tropes, we risk subconsciously perpetuating and reinforcing such AI development in libraries, archives and museums (LAMs) rather than advocating for more nuanced and balanced approaches to discussing both the potential benefits and challenges of AI. LAMs can be places where the meaning of AI is defined, where reflection on its role and significance take place and where public values are promoted [3]. This raises important questions for our sector: If we refuse to use mainstream narratives, how can we build alternative visions of our relation to AI? What kind of futures could the AI4LAM community conjure by using new metaphors and tropes? And what role could these narratives help to define for LAMs in AI discourse? This hands-on workshop invites the AI4LAM community to start collectively creating alternative AI narratives and actively shape the discourse about AI in LAMs. During the session, participants will work in groups on a narrative format chosen by the groups - for instance, a manifesto, map, poem, collage, etc. The session moderators will guide the groups during the creative process to address four key narrative elements: Perspective - participants will reflect on the relation between the LAM community and AI. Who is influencing who in this relationship? Specifically, we will discuss the perspective changes when thinking about “AI for culture” and “culture for AI” [4]; Actors - who should sit at the table when AI adoption is discussed? We will consider how the presence of different experiences and knowledge can shape engagement with AI in LAMs; Language - in their narratives, we will critically reflect on phrases, adjectives and verbs commonly-used to describe AI and create their own vocabularies; Visuals - instead of falling into stereotypical tropes of futuristic robots, participants will be inspired by alternative visuals that aim to be much more inclusive and realistic. At the end of the session, we will collectively discuss the narratives that emerged and how they could impact the work with AI in LAMs. The exercise will help participants develop a more critical eye towards AI narratives and equip them with concrete strategies to create alternative ones. The workshop serves as an invitation for the AI4LAM community to consider the importance of this topic in its future activities. If accepted, the authors would like to coordinate with the conference organisers on how the narratives produced during the workshop could be further discussed in the community. For instance, during the conference we could do a lightning talk to present a snapshot of these narratives or discuss them during a future AI4LAM community call. References: [1] Vrabič Dežman, D. Promising the future, encoding the past: AI hype and public media imagery. AI Ethics (2024). https://doi.org/10.1007/s43681-024-00474-x. [2] https://www.dair-institute.org/tescreal/
[3] Thiel S, Bernhardt JC, eds. AI in Museums: Reflections, Perspectives and Applications. Transcript Verlag; 2024. 4] Netherlands AI Coalition. The Art of AI for All: The connective power of culture and media (2021) https://nlaic.com/wp-content/uploads/2022/02/The-Art-of-AI-for-All.pdf;
Hands on (hack): 2 hours, Rasa Bočytė & Johan Oomen, Netherlands Institute for Sound & Vision
Workshop: Rasa Bočytė, Netherlands Institute for Sound & Vision & Basil Dewhurst, National Film and Sound Archive of Australia
Svetlana Koroteeva. Exploring Possibilities to Enhance Bibliographical Records for Music Collections Using Machine Learning
The rapid advancement of digital technology and machine learning provides significant opportunities to improve how libraries catalog music, making it easier to access for the public. This research uses machine learning to automatically classify music genres, simplifying the cataloging process. Background: Traditional cataloging in libraries involves a lot of manual work, where librarians manually categorize each piece of music by genre. This process is time-consuming and sometimes lacks consistency. Using machine learning can make this process quicker and reduce the workload for library staff. Objective: The main goal of this study is to develop a machine learning model that can accurately identify music genres from library records and audio features. This model uses Convolutional Neural Networks (CNN) and Bidirectional Encoder Representations from Transformers (BERT) to analyze both text and audio data. Data Preparation: The study extracts features from textual bibliographical records using Natural Language Processing (NLP) techniques and from audio data using the Librosa library in Python. Including data from both sources, providing a stronger foundation for genre classification. Technical Approach: A combined BERT+CNN model is used, incorporating NLP for text data and CNN for audio analysis. This model is trained with a variety of music genres and tested to ensure it classifies genres accurately. This method allows for more detailed and accurate analysis of the data, leading to better classification results. Implications: Successfully developing and implementing this model could make managing music collections in libraries more efficient. It could also serve as an example for applying AI to other areas of library management. Additionally, the project considers the ethical use of AI, ensuring that the automated processes respect the cultural significance of music genres. Conclusion: This study contributes to discussions on how digital transformation can help libraries by showing how machine learning can improve the management of music collections. As libraries adopt more data-driven methods, such projects are crucial for helping them evolve into modern knowledge centers. By automating routine tasks, librarians can focus more on complex inquiries.
Presentation: 15 mins, Svetlana Koroteeva, National Library New Zealand
Terhi Nurmikko-Fuller. Mapping Indigeneity in Institutional Repositories
Since the blockbuster release of Chat-GPT, AI has gone mainstream, and industry partners are increasingly eager to feed their large language models with structured datasets capturing domain-expertise and real-world data. In this paper we will argue that what is needed in this age of AI is not just more data, but smarter data. What we need is knowledge that has been represented in ways that capture various perspectives and nuances. In doing so we can start to look at ways to reduce historical and social bias, and to maximise the power and potential of bringing thoughtfully and CAREfully mapped data and increasingly powerful algorithms together. With aspirations to embed the United Nations Declaration of the Rights of Indigenous Peoples (UNDRIP) in its business processes, the Australian National University (ANU) has set upon a challenge of identifying all content relating to Indigenous research across its myriad (and of course completely disparate) repositories, collections, libraries, and other information custodians. This already formidable task is complicated by the different schemas used to represent heterogeneous collections. It is made even more difficult by the lack of any metadata level record of Indigeneity. Prior attempts to map Indigenous research or research that can be considered to be “Indigenous” highlight the complexity of the situation. For example, at one extreme, some data custodians may hold the view that all research carried out in Australia would be considered to have connections, through Country, to Indigeneity; scientists studying frogs are on Indigenous land; engineers build drones that fly in Indigenous skies. At the other end of the spectrum are those who may struggle to see that connection especially in fields that do not readily pertain to Indigenous knowledge, art, culture, or lived experience as a domain of study. With the absence of explicit recording of “Indigeneity” as a specific field in the metadata records, and within a void created by the lack of an existing, comprehensive, university-wide policy on recognising and representing research in terms of engagement with Indigenous data, researchers, and locations, we adopted a preliminary, pragmatic approach of identifying relevant content in three vast datasets: ANU ARIES, which contains all the metadata from all the publications written by all ANU academics; ANU Theses, which contains all the metadata from all the theses completed at the ANU; and, ANU Collections. Here, we report on a Linked Data project of institutional proportions. It is a critical evaluation of the challenges and opportunities the process of converting the metadata from these three internally complex knowledge repositories to the machine-readable format of RDF using repository-specific structures that combine top-level ontologies such as Schema.org with bespoke elements to represent this data in a way that adheres to international standards but is expanded to capture implicit connections and tacit knowledge. We already know of examples in industry where knowledge graphs (KG) are used to inform large language models (LLMS): here is our opportunity to provide LLMs with not just large datasets, but with CAREfully curated KGs that provide not just data points, but contextualised information capturing knowledge about and from Indigenous contexts. In the first instance, we have sought to explicitly represent publications, theses, and objects that have an explicit connection to Indigenous languages, to Indigenous places, or were created, analysed, or otherwise engaged with in projects where the supervisor has been an Indigenous researcher. Going forward, our aspiration is to expand the underlying KG to incorporate relationships that represent Indigenous ways of knowing. The existing data would be enriched with additional properties that show non-Western knowledge structures as different but equally valid ways of meaning-making and semantic capture of the world. In this way, we seek to take preliminary steps towards realising a project that can help tackle issues of intellectual colonialism, diversify the dominance of Western knowledge structures, and indeed help us towards a fantastic future. What we seek to provide are reproducible steps and methodological guidelines that can be utilised by other institutions to tentatively start to tackle the task of identifying Indigenous content in their heterogenous collections.;
Authors: Terhi Nurmikko-Fuller, Adegboyega Adeniran, Janet McDougall, Lawson Lewis, Nicholas Car, Sam Provost, Helen Wright.
Presentation: 15 mins, Terhi Nurmikko-Fuller, Australian National University