Baseline Metadata Wrangling
Metadata from the following institutions was sent for inclusion in the pilot implementation of the WNAF.
- Brigham Young University
- Oregon Digital (Univ. of Oregon and Oregon State Univ.)
- University of Nevada, Reno
- University of Utah
- University of Denver
- Utah Department of Heritage and Arts
- Utah State Archives
- Utah State University
Data that was sent came from multiple systems, including full collection metadata from CONTENTdm, controlled vocabularies from CONTENTdm, JSON-LD from Hydra, EAD finding aids, and spreadsheets from Axiom. Metadata fields that contained personal names and/or corporate bodies were extracted and compiled into one master spreadsheet. This master list of names started with over 500,000 names and was then de-duplicated to remove exact matches. After initial de-duplication, there were 76,360 names.
Data gathered for each entry includes:
- Name (including associated dates if available)
- Institution
- Collection
- Metadata field(s)
- Type of name (personal name or corporate body)
- Cross references
Below are some basic statistics about the data that has been compiled (current as of June 19, 2018)
- Total names submitted:
- Brigham Young University- 30,535
- Oregon Digital (Univ. of Oregon and Oregon State Univ.) - 4170
- University of Nevada, Reno - 1277
- University of Utah - 7533
- University of Denver - 16,608
- Utah Department of Heritage and Arts - 12,138
- Utah State Archives - 3657
- Utah State University - 2067
- Type of Name
- 62,381 personal names
- 10,706 corporate bodies
- 3,273 unknown
- Names used in more than one collection/field
- 13 used in more than 20 collections/fields
- 80 used in more than 10 collections/fields
- 6795 used in 2-5 collections/fields
- 7357 used in more than one collection/field
- Names used in more than one institution
- 1360 in two institutions
- 110 in three institutions
- 11 in four institutions
- 3 in five institutions
- Names used in more than one state
- 267 in two states
- 4 in three states
- Other statistics
- 1091 are single words (could be PN or CB or family name)
- 1922 cross references (this will go up as the list is further refined and deduplicated)
- Over 70 variations on C.R. Savage
- 424 variations on Shipler Commercial Photographers
- 500+ personal names are First Last (rather than Last, First)
- Names in final dataset
- 60,567 names imported in OmekaS
- 418 names set aside for more research (e.g. Dr. Crouch, Mr. Hobart, Sister King, etc.)
- NACO statistics
- 7343 names already in LCNAF
- 72 names updated with death dates
- 15 new LCNAF records created based on WNAF data
- 498 names researched by student waiting for NACO record creation