Case Study: Gathering and Assessing Metadata for WNAF

In order to work with the various formats of data as one dataset, we had to do multiple conversions of the data to get it into a standardized format. The format we chose was a tab delimited text file (tsv) containing as much information related to the names as we could gather from the raw metadata files. The data that we collected included the form of the name as used in the digital collection metadata, alternate forms (if available), institution submitting the name, collection containing the name (if known), metadata field where the name was found (if known), and type of name (if available [e.g. personal name, corporate body, family name]).

Example: For WNAF, since the names we were gathering came from multiple institutions, we wanted to have a variety of information associated with our names authority file. At an early stage of the project we determined that we wanted the following data elements:

    • Name (authorized form)
    • Name Variants (variants, misspellings, etc)
    • WNAF Local Identifier - this helped us track duplicate names, and decided upon the authorized version after doing more research
    • Institution(s) - Institutions associated with the name
    • Geographical information - In our case, state name(s), although this is an area we anticipate representing with more granularity in the future
    • Digital Collection(s) - Collections associated with name, with the intention of linking to individual items at a later date.
    • Library of Congress Name Identifier - added later, after reconciliation