Research Proposal

Research Proposal-Improving Metadata Interoperability with the Common Terminology (CT)

This outlines understanding of the research and work plan for the research – 'A Model and Roles of Common Terminology to Improve Metadata Interoperability.' More details are in the Research Proposal paper.

Research Plan

  1. Defining the Concept of CT and Designing CT - achieving and improving interoperability at the schema level with crosswalks
    • Defining the concept and roles of CT
    • Designing CT of MARC, MODS, DC, and QDC metadata schemas
      • Based on crosswalks of Library of Congress,
      • Based on actual Harvard (MARC), MIT (QDC), and UIUC (MARCXML) metadata records,
      • Based on usages of MARC tags and (Q)DC elements names.
    • Developing crosswalks of MARC to CT and DC & QDC to CT
  2. Representing CT as XML and RDF schemas linking (sub)Properties using SKOS concepts -achieving and improving interoperability at the schema Definition Language level
  3. Performance experiments as a case study for CT with actual metadata of Harvard (MARC), MIT (QDC) and UIUC (MARCXML) - achieving and improving interoperability at the record level with conversions (March, 2014 ~ Dec. 2014)
    • Mapping performance experiments with the developed common terminology (CT)
    • based on the developed CT crosswalks, and with three different databases of University libraries.
      • the mapping experiment: from Harvard (MARC) records to CT,
      • MIT (QDC) to CT, and
      • UIUC (MARCXML) to CT.

The Expanded Research Plan

Conducting a Prototype to build Linked Open Data and CT Union Catalog proving a portal of Harvard, MIT and UIUC– achieving and improving metadata interoperability at the repository level (January, 2015 ~ December, 2017)

    • Conceptualizing CT in SKOS with URIs;
    • Mapping CT to RDF with Python;
    • Conducting Linked Open Data (LOD) with Harvard, MIT and UIUC metadata;
    • Conducting a CT union catalog with pulled metadata and conversion programs;
    • Building a portal, an integrated search engine, of three universities with Linked Open Data (LOD) and the CT union catalog;
    • and, Comparing performance and effectiveness of searching by LOD and CT union catalog.

The Proposal Objectives:

  • To examine the current problems in sharing information, which is caused specially by using various metadata schemas;
  • To give a certain solution, a Common Terminology (CT), of troublesome interoperability by using various metadata schemas;
  • To design a CT and to prove performance of the developed Common Terminology by mappings experiments;
  • To develop meaningful, actionable guidance and implementation strategies of mappings through a Common Terminology, so that they can be commonly used for many libraries, museums, organizations, and governments to improve metadata interoperability;
  • To achieve metadata interoperability at multiple metadata levels: schema, schema definition language, record, and repository.

Detail Benefits:

  • Achieving and improving interoperability at the schema level through developing a Common Terminology
    • Developing Common Terminology by selecting Common Terms and qualifiers along with CTScheme of MARC, MODS, DC and QDC by the below criteria:
      • based on crosswalks of Library of Congress, selecting the Common Terminology which achieve and maximize lexical and semantic interoperability, and minimizing the gap of different degrees of generality or specificity;
      • selecting often used tags or elements names, over 50% usage in Harvard, WorldCat and UIUC metadata records;
      • often used tags or element names by all 5 search interfaces;
      • generalizing the selected Common Terminology by QDC elements usage of MIT records, and MARC & MODS from/to DC & QDC crosswalks of LC;
      • generalizing the selected Common Terminology in order to have 12 Common Terms that are less than the 15 element names of DC; and
      • collecting CTScheme that is defined as a controlled set of values that are specific to Common Terminology. It is a unique chacrateristic of CT used as an authority that designates and limits values to describe resources;
      • besides above six criteria, using common sense to decide Common Terms (properties) and qualifiers (sub-properties).
    • Developing crosswalks for MARC(MODS) to CT and (Q)DC to CT;
    • Improving metadata interoperability, compatibility and convertibility in mappings.
  • Achieving and improving interoperability at the record level with conversions
    • Developing conversions: MARC (MODS) to CT, and (Q)DC to CT;
    • Proving performance of CT measuring much increased accuracy by high lexical and semantic match rates in mappings, with the developed crosswalks: MARC(MODS) to CT and (Q)DC to CT;
    • Measuring much reduced losing information rates in mappings improving preservation rates.
  • Achieving and improving interoperability at the repository level with an integrated search engine.
    • Building CT union catalog and Linked Open Data (LOD)- will be fundamentally important to conduct CT union catalog and LOD for Well-Designed Digital Libraries that will consist of International Open Public Digital Library (IOPDL) for the future.
    • Building an integrated search engine by CT union catalog and LOD- to share information among three universities libraries: Harvard, MIT and UIUC. It will be remarkably advanced research for users of three universities to access any items over three databases. The integrated service will be very powerful and will improve work efficiency greatly giving amazing advantages for all students, faculty, and staff of three universities.
    • To improve effectiveness of search engines through the developed CT

Objects of the research

The objects of the research are everyone who is involved and interested in metadata, mapping, and interoperability. Especially, the objects of the research are focused on three university libraries and their metadata schemas:

  • Harvard Library that open all metadata so that anyone can research them, and that uses the MARC21 bibliographic format;
  • MIT DSpace Library that uses qualified Dublin Core format;
  • UIUC Library that uses MARCXML (Those are the representative metadata schemas these days).

The research project has the following outlines:

  1. Review previous researches of 20C about convertible and compatible problems in sharing data on the bibliography description such as metadata, indexing language, controlled vocabularies, common terms or common languages.
  2. Review present researches and methods for achieving interoperability especially at metadata model levels (e.g., metadata instance, metadata schema, schema definition language, record, and repository levels).
  3. Document problems of existing methods to achieve metadata interoperability. Suggest using a Common Terminology (CT) and suggest processing of creating CT at each metadata interoperability level. Discuss the role& QDC. Create a Common Terminology considering lexical, semantic, syntactic, grammatical level metadata interoperability for them.
  4. Design a practical crosswalk between three chosen databases refining the CT to improve sharing information among them. The three University libraries are Harvard Library that uses MARC21 bibliographic format, MIT DSpace Library that uses qualified Dublin Core format, and UIUC Library that uses MODS format.
  5. Implement the CT on the semantic Web in RDF and XML achieving several levels’ interoperability (e.g., metadata language level, record level, and repository level).
  6. Prove performance of the CT through experiments of direct and indirect mappings.
    1. The Direct mappings with the crosswalk for MARC, QDC, and MODS include the mapping between Harvard library and MIT library, between MIT and UIUC, and between Harvard and UIUC.
    2. Indirect mappings are also experimented through a suggested common terminology (CT) with three different databases of Universities. That is, the mappings are done between Harvard and CT, between MIT and CT, and between UIUC and CT.
    3. Moreover, direct mapping and indirect mapping are implemented with Python computer programs language based on a dictionary or RDF graphs formats.
  7. Compare and analyze the performances of direct mappings and indirect mappings in reducing losing data rates and in increasing accuracy for preserving data.
  8. Consequentially, the research paper reports that the indirect mapping through CT shows better performance to achieve and improve interoperability. The research paper also presents a meaningful, actionable guidance and implementation strategies (e.g. mappings that use the CT based on RDF format), which will be a remarkable solution to achieve and improve metadata interoperability.
  9. Lastly, the scope of the research proposal covers three different well known metadata schemas and the university libraries. It focuses on suggesting a common terminology for the element names of three chosen metadata schemas. This research will be focused on providing guidance that helps share information among the libraries, and that helps search materials by their search engines with the suggested common terminology (CT).
  10. Progress will not be expensive. Recent developments in technology should enable to success to implement the experiments with little cost.

Last Modified August 19, 2014