Case Study: WNAF Student Workflow
For the reconciliation step of the process, we relied on a combination of student work and librarian review to check WNAF for possible matches in the Library of Congress Name Authorities. For WNAF, we wanted to be sure to capture information about which names potentially already had an authorized form.
Student workers were given access to spreadsheet data for WNAF, with the dataset split up by ranges of alphabetized last names.
- Open Open Refine
- Click on Create Project
- Browse to find the spreadsheet file you just downloaded
- Import the data and click on Create Project
- Note: Rename the column with the names to “Name” on excel file.
- If you only want to import part of the excel file, instead of importing the file, click on Create Project and then click Clipboard. Copy the desired data from the spreadsheet and paste it into the Clipboard in Open Refine. Then click Create Project.
- Use xls file extension if uploading file
- In another browser window, go to https://github.com/mcarruthers/LCNAF-Named-Entity-Reconciliation
Or
- Copy the code
- Go back to Open Refine
- Click on the the Undo/Redo tab and click Apply.
- Paste in the code and click on Perform Operations. You will need to wait for a little bit while the process runs, depending on the number of names to reconcile.
Review the names with matches
Star the names that look like good matches and flag the matches that look like errors.
Next step is to add new columns based on good matches (stars) or bad matches (flags)
Go to column All and choose facet by star,
Click on true in the panel on the left.
Go to LC Record Link and choose edit column, add column based on this column:
Label the column as Good Match
Repeat this process for the bad matches (Flags)
Go to all, facet by flag, choose true
Add column based on this column, name it bad match
Exit out of faceting:
Spreadsheet with new columns added:
Export, save as excel:
- Compare this new spreadsheet with the original spreadsheet.
- Some of the names may have disappeared or were reordered as part of accidental duplication through the reconciliation process.
- Go back to source spreadsheet and make sure the names in both spreadsheets have the same ID values.
- Use the source spreadsheet as the reference, and paste in and line up the ID values with their corresponding names.