Step 3: Reconcile, Review, and Clean Your Metadata

Reconciling Your Metadata

Depending on the number of names you are including in your project, and your previous practices of quality control, you might want to reconcile your metadata against a national authority file like the Library of Congress Name Authorities in order to determine if there are matches between your local names and LCNAF. This can also be a good process to undertake if you wish to identify regional or local names that you may wish to add into a national authority file.

OpenRefine is a popular option for this type of vocabulary reconciliation.

Case Study: WNAF Student Workflow

Don't underestimate the amount of manual review needed for this process. While you might get some good matches for your names, you will also get plenty of false matches.

Metadata issues that can lead to bad matches include:

    • Names without dates
    • Last name only, no other information
    • Generic names (john smith)
    • Same names from different institutions - these may look the same but represent different people

Cleaning your Metadata

Deduplication

While the WNAF vocabulary still has many duplicates, this represents an improvement over the many duplicates that were present in our original set of metadata. One simple way to look for duplicates in an alphabetically sorted spreadsheet in google sheets or excel is through the use of a formula like:

=if(or(B2=B1,B2=B3),"DUPLICATE?","")

Here, any identical values in adjacent cells are flagged for review.

You can also use OpenRefine's clustering capabilities to detect duplicates in your names