Step 1: Gather and Evaluate Metadata

The process for this might vary depending on the scope of your project and your metadata sources. For example, you may wish to work with data from your institutional repository, and your digital collections, those items may be in different systems, with different export formats. Most digital library repository software will allow you to export metadata in a structured format such as csv or json. For WNAF, we received metadata from a variety of disparate sources, including csv and tsv files of the full metadata from a collection, plain text files exported from the controlled vocabulary manager within CONTENTdm, JSON-LD files, and Excel spreadsheets listing names along with other extraneous data.

Examples of metadata initially gathered, in a variety of formats:

Example of metadata gathered for the WNAF project, in different formats

Find an initial common file format for your data

  • For WNAF, because the submitted metadata came in a variety of formats, we needed a common file format to store our initial dataset. Our format of choice a the start of the project was tsv (tab separated values). We chose tsv as opposed to csv (comma separated values) due to the frequent presence of commas in name data. TSV allowed us to avoid accidental formatting errors.
  • If you are working with data from a single repository, this will be less of a consideration.

Initial Assessment

Take a broad view of your project and assess the metadata that you have to work with, and what type of elements you would like to have in your vocabulary file. If it is relevant to your project, consider also what your vocabulary would look in a national context. Do you have a unique regional collection at your institution with personal names that might not be in the Library of Congress Name Authority File? Do you want to maintain a list of names and dates, or is there additional information that you might want to include in your local authority file? Decisions made as part of this assessment can affect the tools you choose to disseminate your authority file later on.

  • Is there additional information you would like to have present as you work with your vocabulary metadata?
  • Do you have any wishlist fields that you would like to keep reserved for future enhancements?
  • If you are developing a regional shared vocabulary, do you wish to track contributor information about where the metadata is coming from?

Read More:

Case Study: Gathering and Assessing Metadata for WNAF