The Project

2018-2019 MoMA Archives Linked Open Data (LOD) Fellowship Report by Sarah Ann Adams

PROJECT DESIGN

PROJECT PHASES

Phase 1: Constructing a Wikidata SPARQL Query for Art Exhibition Properties

Project Phase 2: Analyzing Relevant Wikidata Properties Related to Art Exhibitions

Project Phase 3: Researching Linked Art and CIDOC-CRM

Project Phase 4: Understanding MoMA's Needs

Project Phase 5: Creation of a Preliminary Data Model

PROJECT DESIGN

Technologies Used: Day to day work of the fellowship - gathering and/or transforming data to be included in the multi-institutional exhibition history index - entailed using python programming language scripts, APIs (application profile interfaces), Microsoft Access, and both Microsoft Excel and Google Sheets. For the project detailed in this final report, however - aside from utilizing the Wikidata Query Service gather information about which wikidata properties have been used to describe art exhibitions - the technologies used were more administrative in nature: google docs, google sheets, and Lucidchart diagramming service.

Methods or approaches adopted: The methodology of the project was to first understand the trends in how wikidata has been used to describe art exhibitions. Many of the properties used to describe art exhibitions also had "equivalent" predicates from other ontologies included on the respective wikidata property page, which were recorded for potential use in the model to be created for MoMA. Wikidata was selected because there is a strong potential for the exhibition history index data to be stored in an instance of wikibase. Therefore, using existing wikidata properties to inform the model could make that potential transfer of information to an instance of wikibase a more seamless transition. In addition to Wikidata, CIDOC CRM was researched as the formal ontology that could inform the modeling of the art exhibition data, as it is the linked data model most heavily used in the cultural heritage domain. Linked Art, an extension of CIDOC CRM for use with art domain content, was also researched for its specific treatment of exhibitions. Although of a different domain (music), the Carnegie Hall Linked Open Data Model diagram was used as a visual guide for what to include in the diagram of the preliminary model created for the exhibition history index.

Audiences Served: As this is the first year of this linked data fellowship, this project has been primarily exploratory in nature, with the research being conducted for the purpose of informing the creation of a linked data model. This project and its documentation in its current iteration, therefore, serves the audience of the MoMA archives department, namely Jonathan Lill, and acts as a roadmap for future linked data fellows to understand the work that has been done prior to their arrival into the position. In addition to being targeted for internal use, this project and documentation can also peripherally serve scholars and curious alike for future inquiries into the modeling of art exhibition linked data.

PROJECT PHASES

Phase 1: Constructing a Wikidata SPARQL Query for Art Exhibition Properties

As briefly described in the above section, although Wikidata is not a formal semantic ontology, it is a hub for structured data that is rising in popularity (Smith-Yoshimura, 2018). And with the possibility that MoMA might eventually use wikibase - the technology that supports wikidata - as storage for the art exhibition linked data, it seemed only fitting to do a survey of the properties used in wikidata to describe art exhibitions, as well as how often each of the properties are used to describe an art exhibition. The SPARQL query asked for all of the properties (and their labels) for every instance of an art exhibition, as well as for every instance of a subclass of an art exhibition. Per the wikidata generic tree for the wikidata item "art exhibition", this included 15 subclasses. The SPARQL query itself can be found in Appendix A.

Project Phase 2: Analyzing Relevant Wikidata Properties Related to Art Exhibitions

The wikidata SPARQL query returned 166 properties. Wikidata can be edited by anyone, those with or without domain expertise. This is both a great strength and a potential liability of working with wikidata information, depending on your frame of reference. Taking a cursory look at the data, it was clear that not every wikidata item that had been identified as an instance of an exhibition was truly an instance of an exhibition. This was solidified by the return of properties that are better suited to describe an institution (e.g., P112 founded by) or an exhibition catalog (e.g., P393 edition number) rather than an exhibition. Additionally, there were other properties returned that did not pertain to information that could be expressed by the exhibition history index data. Therefore, property exclusion criteria were established in order to focus on the properties that could be of the most use for eventual modeling. The "Art Exhibition Wikidata Properties" google sheet in Appendix B shows all of the properties returned from the SPARQL query, including their reason for exclusion, if applicable, on the 2nd tab. The first tab shows just the selected wikidata properties, and their status as to whether or not they have been included in the preliminary model.

Project Phase 3: Researching Linked Art and CIDOC-CRM

Linked Art - a "community working together to create a shared Model based on Linked Open Data to describe art" - uses CIDOC CRM as the core ontology upon which additional Linked Art classes and properties are built. CIDOC CRM has been characterized as a complex upper ontology, but Dominic Oldman, head of ResearchSpace of the British Museum, stated in his 2015 presentation The CIDOC CRM and Open Knowledge Representation CRM Labs that users of CIDOC CRM are not expected to use the whole ontology nor even only explicitly use the classes and properties outlined in the model; users can take what they need as the basis upon which to build more domain specific knowledge bases (Oldman, 2015).

With this in mind, Linked Art pared down the core CIDOC CRM classes to better suit the needs of art description. Linked Art's CIDOC-CRM Class Analysis includes “classes to ignore”, “ineffective classes”, “unnecessary classes”, as well as “useful classes” and “useful additional classes”. As of April 2019, an analysis of relationships (predicates, in linked data terms) is yet to come. The CIDOC CRM does not intend to be a replacement ontology; rather it intends to be an “interlingua” ontology, one that is a connector between heterogeneous knowledge bases (Berger, 2018). With this understanding, Linked Art created additional terms and also mapped other ontologies’ terms into the CIDOC CRM.

The Linked Art Data model has many components - object descriptions, provenance of objects, etc., - but for the purposes of this report, focus was on the Exhibition component of the model. The “Exhibition Activity” and “Multiple Venues” sections have, since spring 2019, been joined by the section “Exhibition Concept”. This is of particular note for the LOD MoMA fellowship due to conversations about the existing exhibition history index data, which surfaced the need for an exhibition concept as a higher-level class, separate from an exhibition instance, that could contextualize an exhibition instance or collocate multiple related exhibition instances (as with a traveling show with multiple locations, for example). While fully encompassing dates for related instances might be assigned to an exhibition concept for administrative purposes, it is an exhibition instance which has discrete start and end times, takes place in a physical space, and involves the actual work of artists.

Project Phase 4: Understanding MoMA's Needs

In creating a new system or tool, it is necessary to consider potential users and their needs. This was especially important during this project because the needs of the users directly determined the level of granularity desired of the model and - by extension - also determined how closely the model would adhere to CIDOC CRM and/or Linked Art. If the eventual linked data set of multi-institutional art exhibition information was only to be used by information professionals and/or art professionals, perhaps it would be judicious to create the model in strict accordance with CIDOC CRM. But that is not the intended user base of this eventual linked data. While MoMA staff will be encouraged to utilize the resultant linked data set, the main user group is the general public. Despite CIDOC CRM's potential flexibility between multiple domains and ontologies, its logical constraints have the potential to create too much of a barrier between a general user and the data.

In wikidata, for example, to learn the start date of the 1913 Armory Show, the user provides the ID for the exhibition (Q688909), the ID for the property “start time” (P580), and the variable ?startDate, requesting the value ?startDate be returned through the query. Running this query offers up the date “17 February 1913” as the start date of the 1913 Armory Show exhibition. (Click HERE to try the query yourself!) If this same information were modeled using CIDOC-CRM, an exhibition would first be related through to the subject crm:E52_Time-Span. This subject would then become the object of a new triple, related to the actual start date of the exhibition. Instead of the user being able to consider just one triple in writing the query (as seen in the 1913 Armory Show query example), the user would have to traverse two triples (exhibition instance to timespan, and then timespan to the start date), a feat that could be a significant barrier for both new and experienced SPARQL users alike. This is just one example where the CIDOC CRM’s model is too cumbersome for the intended purposes of the exhibition history index data. It is for this reason that CIDOC CRM and Linked Data were used as guides for the model, but not as the only framework within which the model was created.

Project Phase 5: Creation of a Preliminary Data Model

Given the above research conducted on Wikidata art exhibition properties, CIDOC-CRM, and Linked Art, as well as looking to the Carnegie Hall Linked Open Data Model diagram as a visual guide, a preliminary model for the MoMA art exhibition data was iteratively created, responding both to discussions with Jonathan Lill and the expressive capabilities of the exhibition history index data, as well as to the restraints and capabilities of existing semantic ontologies. As of April 2019, there are both properties (predicates) and classes that are included in the model that do not yet exist, namely the different types of relationships a person my have with either an exhibition concept or an exhibition. For the time being, unless further research uncovers applicable properties and classes, the placeholder namespace "MoMA" has been utilized to identify where the creation of such properties and classes might eventually be warranted. The preliminary model can be viewed here.

Google Sites

Report abuse