"Machine Learning & Artificial Intelligence" by mikemacmarketing is licensed under CC BY 2.0.

AI4LAM Metadata Working Group: AI Survey Results

5 August 2022 AI4LAM Metadata Working Group

AI Survey Results & Brief Analysis

In an effort to better understand the current state of machine learning (ML) in libraries, archives, and museums, the AI4LAM Metadata Working Group crafted a survey to gather data on how it is being employed in these settings. The group sent the survey to several prominent list servs targeted at practitioners who either use ML or would potentially be interested in its application. We received 54 responses and while this number certainly gives an useful insight into how ML is being employed, the results are on the smaller side and so we offer the results here as one glance at the issue, noting that more respondents may have shifted the picture. These results are not meant to be conclusive but rather an informal illumination of the current state of ML in LAM. The majority of respondents work in libraries, with archives being a distant second. The survey was designed to gather responses from people who are currently using ML vs those who weren’t but would like to in the future.

Survey Results

Individuals currently using Machine Learning (31% of respondents)

For what types of projects have you used or are you using ML?

  • Enhance or automate workflows (e.g. subject indexing): 9

  • Data analysis: 5

  • Metadata remediation: 1

  • Other: 1 (Aid classification)

What training opportunities have you pursued to learn about ML?

  • Web tutorials: 6

  • Programming language documentation: 5

  • Courses at my institution: 4

  • Conference workshops: 4

  • Other: 2 (AWS ML certification, collaboration w/experts)

What ML framework(s) do you use?

  • Other: 7 (Spacy, Gensim, NLTK, ThaiNLP, LaoNLP, annif, Above through AWS Sagemaker, fastAI, libSVM)

  • Scikit-learn: 5

  • PyTorch: 4

  • TensorFlow: 3

  • Keras: 2

  • SparkML: 1

How have you gained support for ML projects at your institution?

  • It was already an institutional priority: 8

  • Created proof of concept and applied it to a workflow or problem: 3

  • Argued for the need and then operationalized it: 1

  • Other: 1

Have you worked with third party vendors/partners on ML projects?

  • Yes: 3 (Computational Linguistics Lab, Hugging Face, Academic Partners)

  • No: 8

Individuals who are not currently using Machine Learning (64% of respondents)

Are you interested in using machine learning tools for metadata and/or cataloging work?

  • Yes: 26

  • No: 8

What are the greatest barriers to using ML in your work?

  • Lack of technical expertise: 30

  • Not sure how to apply it to my work: 17

  • Too busy with other projects: 17

  • Lack of technical support: 11

  • Lack of institutional support: 11

  • Ethical concerns: 3

  • Other: 1 (Not sure what a ML tool is)

What sort of training opportunities would help you begin to use ML?

  • Web tutorials: 24

  • Programming language documentation: 16

  • Conference workshops: 16

  • Other: 1 (A basic definition and example would be nice.)

Brief Analysis

There are two overall themes that can be observed from the results. The first is from those that are currently using ML in their work. From their responses, there are a variety of tools being used, with almost as many different way to improve their skills varying from tutorials to documentation. Most notable is that for these respondents, their use of ML an institutional priority meaning that there is likely also some level of support for these activities.

For those who are not using ML, main barriers to adopting it are a lack of technical skills, indicating a greater need for resources to aid in the learning process. Web tutorials were ranked highly as a desired resource. We can also surmise that were ML to be elevated as an institutional priority, additional resources would likely become available to aid in necessary skill building activities toward its use.

As we can see, about ⅔ of the respondents are not actively using ML in their work. And while this survey is not exhaustive, it points to a greater need for resources not only to use ML tools but to also better understand how and where it can be used in various projects. A significant hindrance to its use may be an inability to see its potential applications. Along with tutorials, resources describing different projects using ML may well be a useful aid.