Portfolio

These are a few coding projects I've done to improve library records

All results need to be checked for quality control :)

All my repositories are on GitHub


 Adding OCLC numbers to MARC records


Systems: Alma, OpenRefine, OCLC Connexion

APIs: WorldCat Search API, Alma Bib API

Code: javascript/node, GREL

The Problem: We have records in our collection that are missing OCLC numbers. These are typically older records that were created before there was an OCLC record. By now, they typically do have an OCLC record available (many have ISBNs). We want the OCLC number added to the record and the holdings turn on in WorldCat. To add the OCLC numbers, I need a tab delimited file of OCLC numbers with the corresponding Alma MMSID record identifier.

Since many records have ISBN numbers, we can search for the OCLC number using this datapoint. The first challenge is getting the highest quality OCLC number for that ISBN. I used the WorldCat Search API and the repository OpenRefine-oclc-api to get the best OCLC number for the corresponding ISBN.

At this point, we still only have the OCLC numbers--but they still need to be added to their Alma records. This is done with the repository alma-oclc-add.


Once this is complete, I can batch search for the OCLC numbers in Connexion and turn on the holdings. 

 Alma Normalization Rule Generator

command line questions

Systems: Alma

Code: javascript/node

This was based on the concept of a Readme generator I learned in coding bootcamp. 

At Northwestern, I add local subject authority headings from different vocabularies, such as homosaurus. One way to add these headings in batch in Alma is using Normalization Rules, which are written in Drools.

The repository  generate-normalization-rule is a command line script that will build a normalization rule. This prevents errors that might occur from trying to copy-paste a new rule for every local subject heading. 

MARC Language Guesser

code snippit from marc language guesser

Systems: Alma, MarcEdit

Code: Python

The problem: Many records at Northwestern are missing information about the language of the material.

There is a python package guess_language-spirit that guesses the language of free text. Since we transcribe the title text, the 245 MARC tag will typically be in the language of the material cataloged. I used marc-language-guesser  to guess the language of the text based on the title. This was then added back into the record in the MARC 008 fixed field. The 008 were merged back into the record using the MarcEdit merge tool. Records were imported back into Alma using the Alma import.

Since some titles are very short, the accuracy of the guesses needs to be checked.