Common Terminology (CT) Project

New Developed Websites (from January 1, 2015):

  • International Open Public Digital library (IOPDL) project is found on http://www.iopdl.org
  • Common Terminology(CT) is found on http://www.ct.iopdl.org

Good News for the CT project

November – December 2014 Common Terminology (CT) website is created.

October, 2014 UIUC MARCXML to CT Mapping Experiment was reported, available on http://www.ct.iopdl.org/1.1/ReportMARCXMLtoCTconversionexperiment.pdf.

October, 2014 “A Model and Roles of a Common Terminology to Improve Metadata Interoperability” paper was demonstrated as a Best Practice Demonstration at 2014 International Conference on Dublin Core and Metadata Applications held at Austin Texas in the USA.

September, 2014 UIUC MARCXML to CT Mapping Experiment, another empirical evaluation, was conducted with 400,000 University of Illinois MARCXML records.

August, 2014 “A Model and Roles of a Common Terminology (CT) to Improve Metadata Interoperability” paper by Boaz Sunyoung Jin was published, available on https://www.ideals.illinois.edu/handle/2142/50100. It includes almost works for CT project except UIUC MARCXML to CT Mapping Experiment.

August, 2014 400,000 UIUC MARCXML metadata records were provided by Ms. Norman and Professor Cole. They are used for MARCXML to CT Conversion mapping experiment.

July, 2014 “A Model and Roles of a Common Terminology (CT) to Improve Metadata Interoperability” project was presented by Boaz Sunyoung Jin and approved by three Committee in Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign:

    • Research Associate Professor David Dubin
    • Professor and Associate Dean for Academic Programs Linda C. Smith
    • Professor and Interim Dean Allen Renear, Chair

More News are on http://ct.iopdl.org/news/.

Introduction

International Open Public Digital Library (IOPDL) is proposed for the future of the world, since 2008 (Jin, 2014). To establish it, we need to achieve interoperability among well-designed digital libraries selected for inclusion (Jin, 2014).

ISO defines metadata interoperability as “interoperability concerning the creation, … use, transfer, and exchange of descriptive data” (ISO, 2011). Metadata is used to describe and discover resource, and to make available and share resource on the Internet (Nagamori & Sugimoto, 2006).

Nevertheless, metadata interoperability has not essentially been achieved. It has posed a big barrier to sharing and exchanging information among digital libraries. This is due to the use of diverse metadata formats according to each community’s needs. That is, there is no standard way to handle all needs to date. Also, different degrees of generality or specialty of diverse metadata schemas make it hard to achieve interoperability (Jin S., 2014).

In response to this problem, a Common Terminology (CT) among several metadata schemas is suggested as possible solution. The goal of the Common Terminology is to embrace diversity of metadata formats fulfilling needs of many communities. Also, it is to provide uniformity to achieve and improve interoperability minimizing loss of information and preserving accurate information (Jin S., 2014). The Common Terminology concept has been researching since 2011. The Common Terminology project was begun actively from May 2012, supervised by Professor Dubin and supported by Dean Smith and Dean Renear of Graduate School of Library and Information Science (GSLIS) at the University of Illinois at Urbana-Champaign (UIUC).

Definition of CT

Common Terminology is defined as a set of Common Terms. The Common Terms can be common terms of elements names of different metadata schemas, or common terms of thesaurus and controlled vocabularies for subject cataloging. However, the suggested Common Terminology (CT) in the project is not subject vocabulary or thesaurus terms. The CT is confined to common terms, especially, common element names of widely used metadata schemas (e.g., MARC, MODS, DC & QDC). The definitions of terms used in the CT abstract model are:

  • A Common Terminology is a set of Common Terms (including qualifiers).
  • A Common Term is property (element), or class.
  • A property (sub-property) can be one kind of common element (field) or attribute (subfield) in two or more metadata schemas.
  • A class is “a group containing members that have attributes, [behaviors], relationships or semantics in common or a kind of category” (DCMI, 2013).
  • A CTScheme is a controlled set of values that are specific to Common Terminology, including authorities, Syntax Encoding Scheme of DCMI and Vocabulary Encoding Scheme. CTScheme includes:
    • CTTypeGenre,
    • CTFormat,
    • CTRelator,
    • CTLanguage,
    • CTDescription,
    • CTIdentifier, and
    • CTSubject.

The Abstract Model of CT

Figure 1: The CT Abstract Model (Green boxes and blue arrows: new developed CT abstract model, and Yellow boxes and black arrows: existing DCMI Abstract Model) (DCMI, 2013) (Jin, 2014).

Roles and Benefits of CT

As achieving the goal of CT with the concept and abstract models, CT plays significant roles.

  • First, the most important role is to embrace various metadata schemas and to provide uniformity. It allows communities to still use their best ways to describe and preserve metadata according to their needs.
  • Second, it achieves interoperability at multiple levels (schema, schema definition language, record, and repository levels). It, thus, gives a common standard way in achieving interoperability at metadata multiple levels.
  • Third, it ultimately solves a lack of standardization, so that many libraries and organizations can use it commonly.
  • Fourth, the most key role of CT is to provide uniformity for an integrated search engine(s) with built Linked Open Data (LOD) and CT union catalog. Through built LOD and CT union catalog, the search engine can retrieve amazingly fast and efficiently related items over well-designed digital libraries and the IOPDL database.
  • Fifth, the CT union catalog with CT informs where users can access directly retrieved results. It makes users easy search without awareness of changes or limitations. Users may realize the improved performance of search engines in response time, reliability, and relevance of query results.
  • Sixth, it plays a role to maintain balance between different degrees of generality or specificity of existing metadata schemas. It minimizes loss of information in transferring or mapping data between them.
  • Finally, it provides a certain way that communities can share their data or databases, and work together (Jin, 2014).

A Prototype for developing a Common Terminology of MARC, MODS, DC and QDC

To develop a Common Terminology that is defined, a prototype has been proceeding. The prototype is to design and develop a Common Terminology of MARC, MODS, DC and QDC that have very different degree of specificity and generality. The CT project scheme is found in the Research Proposal, Work Plan, and Time Requirements web pages in the Common Terminology (CT).

Done Tasks

  • Task 1. Proposal the Research Project and Background Research (May 2012 ~ Feb. 2013)
    • Reviewed previous researches of 20C about convertible and compatible problems in sharing data on the bibliography description such as metadata, indexing language, controlled vocabularies, common terms or common languages.
    • Reviewed present researches and methods for achieving interoperability especially on metadata interoperability levels (e.g., metadata instance, metadata schema, metadata language, record level, and repository level).
    • Documented problems of existing methods to achieve metadata interoperability. Suggested using a Common Terminology (CT) and suggest to design the CT that achieves interoperability at multiple metadata interoperability level.
    • Documented the research proposal, work plan, and time requirements.
  • Task 2. Suggesting a Common Terminology of MARC, MODS, DC, and QDC, Designing Crosswalks for MARC to CT and (Q)DC to CT (March 2013 ~ August 2013)
  • Task 3. Refining the suggested CT, Applying it to three different metadata records of Harvard, MIT, and UIUC libraries (September 2013 ~March 31, 2014 )
  • Through the MARC tag usage of Harvard, UIUC and WorldCat, MARC to CT crosswalk is created and developed for CT version 1.0.
  • An example of converting Harvard MARC record into the CT 1.0 is Harvard Example CT in XML form.
  • An example of converting UIUC MARCXML record into the CT 1.0 is UIUC Example CT in XML form.
  • Investigated QDC elements and terms usage of MIT QDC metadata records of DSpace library.
  • Developing and refining the designed CT with actual Harvard, MIT and UIUC metadata.
  • Developing MARC to CT crosswalk for CT version 1.0,
  • Developing DC & QDC to CT crosswalk for CT version 1.0,
  • Developing CT Primer, refining the CT so that it can improve metadata interoperability.
  • Finally, CT version 1.0 came out. But, it will be modified and improved by reviewers through the left processes.
  • Define CT version 1.0 in SKOS
  • Implement the CT version 1.0 schema in XML and RDF form to achieve schema language interoperability.
  • Publish the CT on the semantic Web.
    • The conducted and published CT version 1.0 schema in XML form is ct.xsd.
    • The conducted and published CT version 1.0 schema in RDF` form is ct.rdf.
    • The refined and published CT documentation for version 1.0 is CT version 1.0.
  • The CT diagram for version 1.0 is CT 1.0 diagram:
    • However, the Common Terminology 1.0 is developed into 1.1, in order to solve the semantic error that points the same for 'other' sub-properties of different properties in rdfs graph. Thus, in 'format' and 'publisher' properties, sub-property 'other' has deleted. In 'date' property, 'other' sub-property has changed into 'dateOther.' In 'description' property, 'other' has changed into 'descriptionOther.' In 'identifier' property, 'other' has changed into 'identifierOther.'
    • The Published CT on the semantic Web (June, 2014):
        • The conducted and published CT version 1.1 schema in XML is ct.xsd.
        • The conducted and published CT version 1.1 schema in RDF` is ct.rdf.
        • The refined and published CT documentation for version 1.1 is CT version 1.1.
        • The conducted and published CT version 1.1 in SKOS is ctskos.rdf
  • Task 4. Conducting the Mappings Experiments with the Developed CT as a Case Study (March 1, 2014~August, 2014)
    • Several experiments with Harvard, MIT, and UIUC metadata records are planned to demonstrate the performance of CT in mappings:
      • how much the CT minimizes loss of information, and
      • how much it increases accuracy and preservation rates measuring lexical and semantic match rates.
    • The paper, “A Model and Roles of a Common Terminology to ImproveMetadata Interoperability” discuss in detail the followings:
      • which criteria are used to design CT of MARC, MODS, and DC & QDC schemas;
      • what is the frame of CT to minimize loss of information;
      • and as a result, how much CT improves metadata interoperability increasing accuracy and preservation rates.
    • The objectives of Task 4 is to conduct Mapping Performance Experiments with Conversions for the developed CT, with MIT (QDC) metadata records improving metadata interoperability at the record level (March, 2014 ~ August, 2014)
      • Developing MIT(QDC) to CT Conversion with Python language;
      • Converting local 20,000 metadata records of MIT(QDC) into CT in XML- one of the converted CT from MIT(QDC);
      • (*but I will not show original MIT(QDC) records, because I respect their cooperation to develop the CT project);
      • Evaluating performance of CT measuring transfer rate, miss information rate, and lexical and semantic match rates; and
      • Developing meaningful, actionable guidance and implementation strategies of mappings with the Common Terminology in order to improve metadata interoperability.
  • Task 5. Evaluate their performance and develop the final paper (March, 2014 ~ August, 2014)
      • The objective of Task 5 is to evaluate CT performance results, to describe analyses, and to develop the paper. The paper includes Common Terminology (CT)’s roles, usefulness, effectiveness, and importance to achieve interoperability among different metadata schemas and records. It also includes what we learn from the project and what are our recommendations or suggestions for the future works to improve interoperability especially metadata interoperability in sharing information.

Doing Tasks for Additional Works but Important in CT Performance and near future works (September, 2014 ~ December, 2014)

  • Conducting another Mappings Experiments with the Developed CT
    • Developing a Conversion of Harvard (MARC) and UIUC (MARCXML) metadata records to CT,
    • Generalizing the Conversion into MARC to CT conversion with Python language,
    • Converting local metadata records of Harvard (MARC 12 million) and UIUC(MARCXML 10 million) into CT;
    • Evaluating performance of CT measuring transfer rate, miss information rate, and lexical and semantic match rates; and
    • Developing meaningful, actionable guidance and implementation strategies of mappings with the Common Terminology in order to improve metadata interoperability.

Next for the Suggested Expanded Project, if we can have grants and funds.

  • Task 6. Mapping metadata of three universities to Linked Open Data (September, 2015 ~ August, 2016)
    • Objective is
      • Prerequisite: Converting local metadata into the CT: Europeana uses a common dataset to achieve data interoperability among participating providers, so that they can map a reasonably useful set of metadata. It was a Dublin Core application profile with a subset of DC elements (Isaac, Clayphan, & Haslhofer, 2012). However, we have the Common Terminology that proves better performance in mapping for commonly used MARC and MODS, and DC & QDC minimizing loss of information. Thus, we can use the converted CT from original QDC (MIT), MARC (Harvard), and MARCXML (UIUC) metadata with Python conversion programs, which will have done in Task 4.
      • The suggested general steps are in Task 6 work step.
  • Task 7. Conducting an integrated search engine by generating a union catalog with the CT(September, 2016 ~ August, 2017)
      • Objective is to generate a union catalog with the converted CT from Harvard (MARC), MIT (QDC), and UIUC (MARCXML). And it is to conduct an integrated search engine that retrieves related items through the generated union catalog. Lastly, it is to compare the performance of search engines between with the conducted LOD and with the generated union catalog in achieving interoperability at the repository level.
      • The suggested general steps are in Task 7 work step.

Reference

DCMI. (2013). DCMI Abstract Model. Retrieved from Dublin Core Metadata Initiative: http://dublincore.org/documents/abstract-model/

ISO. (2011). Information technology —Metadata Registries Inoteroperability and Bindings - Part 1: Framework, common vocabulary, and common provisions for conformance. ISO/IEC FDIS 20944-1.

Jin, B. S. (2014). International Open Public Digital Library (IOPDL): A Proposal for the Future. Illinois Digital Environment for Access to Learning and Scholarship (IDEALS). Retrieved from http://hdl.handle.net/2142/50101

Jin, B. S. (2014). A Model and Roles of a Common Terminology to Improve Metadata Interoperability. Illinois Digital Environment for Access to Learning and Scholarship (IDEALS). Retrieved from http://hdl.handle.net/2142/50100

Nagamori, M., & Sugimoto, S. (2006). A Metadata Schema Registry as a Tool to Enhance Metadata Interoperability. TCDL Bulletin, 3 (1). Retrieved from http://www.ieee-tcdl.org/Bulletin/v3n1/nagamori/nagamori.html

OCLC. (2010). Implications of MARC Tag Usage on Library Metadata Practices. Retrieved from http://www.oclc.org/content/dam/research/publications/library/2010/2010-06.pdf? urlm=162940

Last Modified: October 7, 2014