Case Study: WNAF Student Workflow

For the reconciliation step of the process, we relied on a combination of student work and librarian review to check WNAF for possible matches in the Library of Congress Name Authorities. For WNAF, we wanted to be sure to capture information about which names potentially already had an authorized form.

Student workers were given access to spreadsheet data for WNAF, with the dataset split up by ranges of alphabetized last names.

    • Open Open Refine
    • Click on Create Project
    • Browse to find the spreadsheet file you just downloaded
    • Import the data and click on Create Project


    • Note: Rename the column with the names to “Name” on excel file.
    • If you only want to import part of the excel file, instead of importing the file, click on Create Project and then click Clipboard. Copy the desired data from the spreadsheet and paste it into the Clipboard in Open Refine. Then click Create Project.
    • Use xls file extension if uploading file


Go to https://github.com/mcarruthers/LCNAF-Named-Entity-Reconciliation/blob/master/PersonalNamesReconcile.txt

Or

https://github.com/mcarruthers/LCNAF-Named-Entity-Reconciliation/blob/master/CorporateNamesReconcile.txt

    • Copy the code
    • Go back to Open Refine
    • Click on the the Undo/Redo tab and click Apply.
    • Paste in the code and click on Perform Operations. You will need to wait for a little bit while the process runs, depending on the number of names to reconcile.

Review the names with matches

Star the names that look like good matches and flag the matches that look like errors.

Next step is to add new columns based on good matches (stars) or bad matches (flags)

Go to column All and choose facet by star,

Click on true in the panel on the left.

Go to LC Record Link and choose edit column, add column based on this column:

Screen Shot 2017-04-05 at 3.02.20 PM.png
Screen Shot 2017-04-05 at 3.03.41 PM.png

Label the column as Good Match

Repeat this process for the bad matches (Flags)

Go to all, facet by flag, choose true

Screen Shot 2017-04-05 at 3.10.59 PM.png

Add column based on this column, name it bad match

Screen Shot 2017-04-05 at 3.14.18 PM.png

Exit out of faceting:

Screen Shot 2017-04-05 at 3.20.08 PM.png

Spreadsheet with new columns added:

Screen Shot 2017-04-05 at 3.21.03 PM.png

Export, save as excel:

Screen Shot 2017-04-05 at 3.22.00 PM.png
    • Compare this new spreadsheet with the original spreadsheet.
    • Some of the names may have disappeared or were reordered as part of accidental duplication through the reconciliation process.
    • Go back to source spreadsheet and make sure the names in both spreadsheets have the same ID values.
    • Use the source spreadsheet as the reference, and paste in and line up the ID values with their corresponding names.