Improving Metadata Interoperability at Repository Metadata Model Level and Conclusion

As a next step, a prototype is planned to achieve and improve metadata interoperability at repository level. The prototype is to build an integrated search engine for the pulled records from Harvard, MIT and UIUC libraries. It will be conducted in two ways. One is to build Linked Open Data (LOD) with the mapping, the converted CT to RDF, by the Python programs that will be designed. The converted CT is CT records transformed from Harvard (MARC), MIT (QDC), and UIUC (MARCXML) metadata records by the conversions. LOD hasn’t been accomplished a lot in America. In the Harvard library case, the project, ‘Linked open data, expandedauthority search capabilities, and synonym expansion’ had tried to build linked open data in 2012. But, it reports challenges in selection of the ontology, and in developing ‘the pipeline to convert MARC records into RDF’ (Cheng, 2012). The Harvard case proves that directly converting MARC records into RDF is very challenging and most likely impossible. Because of that, in UIUC library case, they convert first MARCXML records into MODS using the MARC to MODS conversion provided by Library of Congress (LC). But, the way has still problems to convert MODS to RDF using guidelines by LC. The developed Common Terminology can be a proper solution to build linked open data using the converted CT from MARC records connecting existing ontology. Using CT makes facility to map CT into RDF. The other is to build a CT union catalog with the converted CT from the pulled three universities’ metadata records by the conversion. This way bases on existing union catalog of WorldCat. For these works, the conceptualized CT in SKOS with URIs will be necessary.

With both the built LOD and CT union catalog, an integrated search engine will be developed for Harvard, MIT and UIUC libraries. By cooperation of three universities for the prototype, Harvard provided 1.5 million online accessible records including the records Google cannot access. MIT provided all Dspace Metadata records of more 700 comunities. UIUC will provide about a million records in 7 files. The prototype will be very important to suggest an assured way to improve interoperability among three universities’ libraries that use different standards. The prototype will contribute to engage many users of three universities who seek plentiful resources online but thirst. The prototype will demonstrate a certain solution to build interoperability globally with CT among libraries and organizations that need to achieve interoperability at all schema, schema language, record, and repository levels. The prototype can be expanded to achieve interoperability for an integrated search engine of Well-Designed Digital Libraries all over the world, which will consist of the International Open Public Digital Library.

Conclusion and Future Work

To achieve metadata interoperability at multiple levels, the Common Terminology is suggested. Because existing metadata standards have very different degrees of generality and specificity, it is difficult to achieve interoperability among them without a loss of information. The Common Terminology (CT) is a bridge terminology among different standards, allowing communities to use their own standards but providing uniformity to searching. As a case study, CT 1.1 is developed to achieve interoperability among four significantly different standards (MARC, MODS, DC and QDC) and their metadata from three universities (Harvard-MARC, MIT-QDC and UIUC-MARCXML). CT 1.1 improves interoperability at multiple levels, using existing techniques that achieve interoperability at each metadata level such as crosswalks, conversions, etc. At the schema level, CT that maximizes lexical and semantic interoperability is chosen as Common Terms (properties) and qualifiers (subproperties). At the schema definition language level, the developed CT is represented in ct.xsd (XML schema), ct.rdf (RDF schema), and ctskos.rdf (SKOS concept). At the record level, the conversion from MIT (QDC) to CT is developed to prove performance of CT. By the mapping experiment with the conversion, CT shows 99.99537% transfer rate, 98.7% lexical match rate, and 100% semantic match rates. As a result, CT minimizes incredibly loss of information (0.00463%) over every metadata statement in the input records. CT increases significantly accuracy in mappings showing high lexical and semantic match rates. It reduces significantly the gap of different degrees of generality and specificity. For improving interoperability at the repository level, a prototype is planned to build an integrated search engine for the pulled records from Harvard, MIT and UIUC libraries. It will be implemented using Linked Open Data and CT union catalog with the developed CT. The prototype to improve interoperability at repository level with CT will contribute to engage many users of three universities who seek plentiful resources online but thirst. The prototype can be expanded to achieve interoperability for an integrated search engine of Well-Designed Digital Libraries all over the world, which will consist of the International Open Public Digital Library. Interoperability via a common terminology has not been realized in the library field in America. CT 1.1 demonstrates a solution to the problematic barriers in achieving interoperability at multiple levels. Further, CT offers a certain solution to build interoperability globally among libraries and organizations that need to achieve interoperability at all schema, schema language, record, and repository levels.

Acknowledgments

This research project has been supported by the Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign (UIUC). Thanks to Professors David Dubin and Linda Smith for their guidance on this project. We also appreciate the cooperation of Harvard, MIT, and UIUC in providing their metadata for the project.

Reference

Chan, L. M., & Zeng, M. L. (2006, 06). Metadata Interoperability and Standardization – A Study of Methodology Part I. Achieving Interoperability at the Schema Level. D-Lib Magazine, Volume 12(Number 6). Retrieved from http://www.dlib.org/dlib/june06/chan/06chan.html

Cheng, S. (2012). Blink Project: Linked Open Data for Countway Library Final report for Phase 1. Retrieved from https://osc.hul.harvard.edu/sites/default/files/406_Blink%20Phase%201%20final%20report%20Dec.%202012.pdf

DCMI. (2009). Interoperability Levels for Dublin Core Metadata. Retrieved from Dublin Core Metadata Initiative: http://dublincore.org/documents/interoperability-levels/

DCMI. (2013). DCMI Abstract Model. Retrieved from Dublin Core Metadata Initiative: http://dublincore.org/documents/abstract-model/

DCMI. (n.d.). DCMI Metadata Terms. Retrieved from Dublin Core Metadata Initiative: http://dublincore.org/documents/dcmi-terms/#elements-subject

Haslhofer, B., & Klas, W. (2010). A Survey of Techniques for Achieving Metadata Interoperability. ACM Comput. Surv., 42(2). Retrieved from http://eprints.cs.univie.ac.at/79/1/haslhofer08_acmSur_final.pdf

ISO. (2011). Information technology —Metadata Registries Inoteroperability and Bindings - Part 1: Framework, common vocabulary, and common provisions for conformance.

Jin, B. S. (n.d.). Evaluating Existing Digital Libraries as a prototype with the Suggested Criteria: Content, Usability, and Performance Evaluation Criteria. Retrieved from http://courseweb.lis.illinois.edu/~sunjin/Papers/InternationalOpenPublicDigitalLibrary-EvaluatingDLs.pdf

Jin, B. S. (n.d.). International Open Public Digital Library (IOPDL): A Proposal for the Future. Retrieved from http://courseweb.lis.illinois.edu/~sunjin/Papers/InternationalOpenPublicDigitalLibrary-Proposal.pdf

Lancaster, F. W., & Smith, L. (1983). Compatibility Issues Affecting Information Systems and Services. General Information Programme and UNISIST.

LC. (2008). MARC to Dublin Core Crosswalk. Retrieved from loc.gov: http://www.loc.gov/marc/marc2dc.html

LC. (n.d.). Conversions. Retrieved from Metadata Object Description Schema (MODS): http://www.loc.gov/standards/mods/mods-conversions.html

LC. (n.d.). MARC Code List for Relators. Retrieved from http://www.loc.gov/marc/relators/

LC. (n.d.). Standards at the Library of Congress. Retrieved from The Library of Congress: http://www.loc.gov/standards/

MODS. (n.d.). MODS User Guidelines Version 3: Detailed Description of MODS Elements. Retrieved from Metadata Object Description Schema (MODS): http://www.loc.gov/standards/mods/v3/mods-userguide-elements.html

Nagamori, M., & Sugimoto, S. (2006). A Metadata Schema Registry as a Tool to Enhance Metadata Interoperability. TCDL Bulletin, 3 (1). Retrieved from http://www.ieee-tcdl.org/Bulletin/v3n1/nagamori/nagamori.html

NISO. (2004). Understanding metadata. Retrieved from http://www.niso.org/standards/resources/UnderstandingMetadata.pdf

Smith-Yoshimura, K., Argus, C., Dickey, T. J., Naun, C. C., Ortiz, L. R., & Taylor, H. (2010). Implications of MARC Tag Usage on Library Metadata Practices. OCLC. Retrieved from http://www.oclc.org/content/dam/research/publications/library/2010/2010-06.pdf?urlm=162940

Svenonius, E. (1983). Compatibility of Retrieval Languages: Introduction to a Forum. Int. Classif, 10(No.1), 2-4.

W3C. (2009). SKOS Simple Knowledge Organization System Primer. Retrieved from w3.org: http://www.w3.org/TR/skos-primer/

W3C. (2009). SKOS Simple Knowledge Organization System Reference. Retrieved from w3c.org: http://www.w3.org/TR/skos-reference/

W3C. (2014). RDF Schema 1.1. Retrieved from w3.org: http://www.w3.org/TR/rdf-schema/

W3C. (n.d.). XML Schema. Retrieved from w3.org: http://www.w3.org/XML/Schema

Zeng, M. L., & Chan, L. M. (2006, 06). Metadata Interoperability and Standardization – A Study of Methodology Part II Achieving Interoperability at the Record and Repository Levels. D-Lib Magazine, 12(6). Retrieved from http://www.dlib.org/dlib/june06/zeng/06zeng.html

Last Modified August 29, 2014