DRMC

69days since
Next DRMC Meeting

CONTENTdm Migration Resources

A place to share tools and techniques for migrating CONTENTdm collections to the DRC.  

Migration Scripts and Tools


John Millard's example PHP Ingest script  and sample data

Ingest Script
Original Tab Delimited export from CDM Sample Collection
Modified version of above with drc field names added above the CDM field names (omit the dc. prefix and use "skip" for fields that need to be skipped)


Using the CONTENTdm OAI Interface as a data source


This xslt from Tom Habin at UIUC may be useful - http://dlf.grainger.uiuc.edu/dlfcollectionsregistry/oai/oai_dc2csv.xsl
From his message on OAI-general:
"Over time I've gotten several requests for comma-separated-value dumps from various of the OAI repositories that I manage. In case anyone else might have similar needs I'm making an XSLT available that will transform the oai_dc XML format into a CSV file where the columns represent DC fields:
The output details are parameterized in case you prefer tab-delimited instead comma-delimited, for example."

A quick script to extract the OAI DC metadata as XML.  There are probably better ways. - John's OAI Harvest Script
Add 

Some notes on CONTENTdm Compound Objects
Compound objects are represented as individual page (or other label) images bound together in a compound object description file, a standard XML document ( the .cpd file) that represents the structure of the compound document.  There is one .cpd file for each compound object and it is located in the /image subfolder of the CONTENTdm collection folder alongside the image files that it describes.  As a standard XML file, it should be straightforward to apply an xslt transform on the .cpd file to extract the page image filenames and output the formatted contents manifest.  A little  more work to retrieve the objects and the entire bulk submission package can be constructed.  Note, this technique is hypothesized in Lynna's proof of concept.

See an example .cpd file from the Letters of John Browne Collection at Miami University
See an example xslt stylesheet that successfully transforms a cpd file into a DSpace contents file

In the tab delimited export from CONTENTdm, there is a metadata line for both the digital objects and the cpd that binds them together.  If you created metadata only for the object as a whole, you can ignore the individual page records.  If you applied metadata for each page image, you will need to find a way to merge the data into an appropriate DSpace record as DSpace appears to use a one record per intellectual object model.