Baseline Metadata Wrangling

Metadata from the following institutions was sent for inclusion in the pilot implementation of the WNAF.

    • Brigham Young University
    • Oregon Digital (Univ. of Oregon and Oregon State Univ.)
    • University of Nevada, Reno
    • University of Utah
    • University of Denver
    • Utah Department of Heritage and Arts
    • Utah State Archives
    • Utah State University

Data that was sent came from multiple systems, including full collection metadata from CONTENTdm, controlled vocabularies from CONTENTdm, JSON-LD from Hydra, EAD finding aids, and spreadsheets from Axiom. Metadata fields that contained personal names and/or corporate bodies were extracted and compiled into one master spreadsheet. This master list of names started with over 500,000 names and was then de-duplicated to remove exact matches. After initial de-duplication, there were 76,360 names.

Data gathered for each entry includes:

    • Name (including associated dates if available)
    • Institution
    • Collection
    • Metadata field(s)
    • Type of name (personal name or corporate body)
    • Cross references

Below are some basic statistics about the data that has been compiled (current as of June 19, 2018)

    • Total names submitted:
      • Brigham Young University- 30,535
      • Oregon Digital (Univ. of Oregon and Oregon State Univ.) - 4170
      • University of Nevada, Reno - 1277
      • University of Utah - 7533
      • University of Denver - 16,608
      • Utah Department of Heritage and Arts - 12,138
      • Utah State Archives - 3657
      • Utah State University - 2067
    • Type of Name
      • 62,381 personal names
      • 10,706 corporate bodies
      • 3,273 unknown
    • Names used in more than one collection/field
      • 13 used in more than 20 collections/fields
      • 80 used in more than 10 collections/fields
      • 6795 used in 2-5 collections/fields
      • 7357 used in more than one collection/field
    • Names used in more than one institution
      • 1360 in two institutions
      • 110 in three institutions
      • 11 in four institutions
      • 3 in five institutions
    • Names used in more than one state
      • 267 in two states
      • 4 in three states
  • Other statistics
      • 1091 are single words (could be PN or CB or family name)
      • 1922 cross references (this will go up as the list is further refined and deduplicated)
      • Over 70 variations on C.R. Savage
    • 424 variations on Shipler Commercial Photographers
      • 500+ personal names are First Last (rather than Last, First)
    • Names in final dataset
      • 60,567 names imported in OmekaS
      • 418 names set aside for more research (e.g. Dr. Crouch, Mr. Hobart, Sister King, etc.)
    • NACO statistics
      • 7343 names already in LCNAF
      • 72 names updated with death dates
      • 15 new LCNAF records created based on WNAF data
      • 498 names researched by student waiting for NACO record creation