Sample Use Cases

Metadata Migration Use Cases

University of Utah Migration Summary

The University of Utah completed migrations for two large CONTENTdm repositories, Utah Digital Newspapers (https://newspapers.lib.utah.edu/search), and our Digital Collections (https://collections.lib.utah.edu/) into Solphal, a homegrown system built on open source components (Solr, Phalcon, nginx). The more predictable and standardized newspapers repository was migrated first, in part to provide a test experience for migrating our not very standardized digital collections. CONTENTdm templates were assessed and fields had to be merged, renamed, and in some cases deleted prior to migration. In addition CONTENTdm XML needed to be transformed into valid xml through a scripting process in order to prepare it for ingestion into solr. While normalizing our fields as part of the migration process was essential, now that we’ve migrated to a more flexible platform, we are starting the process of assessing and remediating metadata values to improve metadata quality and add additional options for faceting.

Emory University Migration Summary

Emory’s Metadata Working Group is in its second year of planning for a multiple-system migration/consolidation to a centralized Hydra/Fedora 4 platform. Legacy repository systems include homegrown Fedora 3 applications; third party/hosted systems (Extensis Portfolio Server, Symplectic Elements, Dataverse) - with varied use of XML (MODS, MARCXML, DC, custom schemas) and relational database schemas. A major aspect of our migration is transitioning to RDF, so we’d share our approaches to that, and talk about community efforts we’ve joined. Additionally, we are implementing a new Core Metadata standard, normalizing repository metadata in segments (not just Descriptive) and considering the data modeling ramifications in parallel. Our rough outline is: 1) scope of effort, 2) tactics and artifacts for analysis, 3) normalization strategy and challenges, 4) our approach to moving to RDF, and 5) takeaways/lessons learned.

University of Maryland Migration Summary

We recently finished migrating accessions and finding aids from an Access database appropriately named The Beast into ArchivesSpace. The original intention long ago was that the data would be more structured and easier to migrate if stored in Access. But we found that structure does not matter if, after 20 years, no one approached data entry very consistently. All the data required a lot of cleanup - the accessions through the use of OpenRefine (and manually review of control files by curators). Finding aids required much more intensive work to get from the (technically valid) EAD XML generated from our database into EAD XML that adhered to import rules and data structures required and/or supported by ArchivesSpace. This work required Python scripts, some more Python scripts, some OpenRefine, then some more Python, then XSLT, then Schematron.

We are also getting super close to beginning to migrate our metadata for digital collections out of Fedora 2 (in which we use a homegrown metadata schema that sort of resembles MODS) into Fedora 4 and RDF. We are still finalizing our metadata model/application profile, and trying to work out how we will deal with local authorities, as well as how we will map objects to URIs, with LCSH subject headings giving me the biggest headaches.

Indiana University Migration Summary

IU is still in the planning stages but migration from Fedora 3 to Fedora 4 is supposed to happen within the next year. As the metadata person, I need a plan to handle our descriptive metadata in Fedora (in MODS for the most part but some is also custom) and possibly a way to manage transforming METS into manageable structural information in RDF - using PCDM possibly. We are already in the process of mapping descriptive metadata to share with DPLA and, in doing that work, it seems like a good opportunity to reconcile to Linked Data where we can. We’ve also been involved in the MODS to RDF Subgroup work that’s been happening in the Hydra community to offer a recommended path to simple RDF for use in Fedora. So the migration path might end up looking like a mapping from MODS to RDF for collections that use MODS and incorporating Linked Data if a collection is reconciled to map to DPLA. That’s what I hope to have figured out by DLF time.

Northwestern University Migration Summary

NUL is planning two (!) migrations soon. One is migrating our Avalon instance from 4 to Avalon 6.X - this also requires migrating our metadata from Fedora 3 to Fedora 4. We are hoping for this migration to be complete by our Fall quarter.

Our second migration will still be in the planning stages by DLF but we are developing a brand-new repository for all of our digital objects based on Hyrax/Fedora. Our other current repository is for images and uses VRA Core 4 and Fedora 3. Part of this migration means moving from VRA Core to RDF. We are also starting to work with DPLA on submitting our metadata of which little is Linked Data-compliant. Neither of our repositories have anything other than an identifier for our DC data streams so filling those out for DPLA and other uses will hopefully occur as well.

Duke University Migration Summary

DUL has been migrating it’s 20+ years of digital collections since 2015, and we are about halfway through (this might be ambitious since we did the ‘easy’ stuff first). The old platform is a Django/Python based custom application and all the metadata (loosely Dublin Core, with lots of ‘customization’) is encoded in METS. The new platform is a Fedora/Hydra/Blacklight stack, and the metadata is stored as N-Triples. We’re (mostly) storing ‘real’ predicates but aren’t storing URIs as values/objects yet - I just try to use linked data-ready vocabularies/thesauri so we can transform the values once we have that capability. The migration process has involved a lot of analysis and data wrangling using OpenRefine, regular expressions, and Ruby scripting. It’s slow going but the upside is that we’ve used the process as an opportunity to develop our first metadata application profile (still Dublin Core, still with some customizations, but a lot more constrained and carefully considered), in the hopes that future migrations will be a tad easier!