OCHRE and the Semantic Web

By Sandra Schloen, February 2019

The Semantic Web is a set of services and standards, endorsed by the World Wide Web Consortium (W3C), that allows data to be published, shared, and accessed, by both computational processes and human users, within a common framework.

Many datasets of interest have been published on the Semantic Web as Linked (Open) Data and can be accessed by computational systems such as OCHRE. Such datasets will conform to RDF/XML standards, and will be described by a published schema or ontology. Access to data on the Semantic Web is through SPARQL (pronounced like "sparkle"), a special-purpose Query Language.

OCHRE's built-in mechanism for linking data across projects is the Thesaurus feature, and this naturally extends to integrating linked data from the Semantic Web. By using specially configured Derived Variables as described below, in conjunction with the Thesaurus, OCHRE lets projects link to content from the world wide web and integrate this content with core OCHRE data.

Configuring a Semantic Web domain

In this example, we are going to walk through the steps to configure a Derived variable called "Wikidata" to interact with the set of Wikipedia data published as linked open data and made accessible via SPARQL by Wikidata. We will then lookup "Oak", or rather the scientific term for it, "Quercus" and link it to OCHRE's Taxonomy value of "Oak," nested within the biological taxon hierarchy.

To begin, we create an Alphanumeric Variable, and designate it with the Derivation type of Semantic Web.

Entity URI

One of the core principles of the Semantic Web is that any item of interest, or any web "resource," can be uniquely identified by an http-based URI. For each domain being configured for OCHRE, we must give OCHRE the prefix of the "Entity URI" which OCHRE can use to derive the unique identifier of each item being looked up and linked.

The "Quercus" entity in Wikipedia's dataset, for example, is referenced by the following URI: https://www.wikidata.org/entity/Q12004

How would you know that "Quercus" was "Q12004" in Wikidata? You wouldn't!

And so we teach OCHRE how to look up content using SPARQL, first by giving OCHRE the SPARQL endpoint as documented by the domain of interest.

Next, call in a SPARQL expert to write the query syntax to fetch the data of interest.

The SPARQL Template

Writing a SPARQL query template will require both expertise on the Query Language, and research as to the vocabulary (properties, entities, terminology) of the domain of interest. Many linked-data sites provide extensive documentation and a variety of examples. But it is up to someone who knows how to write SPARQL queries to prepare an appropriate query template that will be understood by OCHRE.

This is not the place to explain the syntax of SPARQL, except to say it is modeled on the principle of referencing a "triple" composed of a subject-predicate-object. E.g. find an item (subject) whose label (predicate) is "Quercus" (object).

There are several rules and options to be aware of when preparing a SPARQL template for OCHRE:

  • The template must contain the special code <Name> as the object of a predicate, or as the target of some appropriate string-matching request. When OCHRE sends the query to the SPARQL engine, it will substitute in the Name of the OCHRE item in place of the "<Name>" code. In this example, the WHERE clause will become:
                    WHERE { ?item ? label "Quercus"@en .
  • The query must be designed to return any of a ?subject, ?item, ?id, or ?identifier as a keyword. The return value of one of these "subject" items must contain the identifying code that uniquely identifies the entity being fetched. OCHRE will append the value of this identifier to the Entity URI to create the official, stable URI for the item being represented.
  • The query should return some descriptive content -- a definition, a preferred label, a term, a description, etc. -- based on the vocabulary of the domain to provide some explanatory content for the returned values. OCHRE will create a descriptive string of such content.
  • The query may contain a return value of ?image. Images are fetched in a variety of ways, based on the domain and the data set; the query syntax would need to support this. If one or more images are returned, OCHRE will retain them as part of the Thesaurus link. In this case, the Semantic Web icon is lit more brightly with a cyan edge to visually indicate the presence of an image. Images will be fetched on-the-fly when needed for a View.
  • The query may contain return values of ?lat and ?long (alternately, ?latitude, ?lon, ?longitude). If an appropriate pair of coordinates is recognized by OCHRE, and if that item's metadata Coordinates are blank, OCHRE will assign the looked-up coordinates to the item and save these. Alternately, the query may return ?coordinates, and OCHRE will parse the value into a latitude/longitude pair, expecting data in the format: Point(lon lat) (as given by Wikidata's P625 "coordinate location" for example).
  • The return format must be RDF/XML. OCHRE provides a field to enter a Output format instruction if such is needed to request RDF/XML as the SPARQL return format from the given domain. SPARQL engines often return RDF/XML by default making this unnecessary in many cases.

Creating a Thesaurus link

Within OCHRE, all items from the Concepts, Locations & Objects, Periods, Persons & Organizations, and Taxonomy categories are enabled to integrate with Semantic Web data through their Thesaurus tabs.

Clicking on the Semantic Web icon from any Thesaurus links toolbar will pop up a window from which you can pick from among the available domains that have been configured for OCHRE. Note that you can link data either as a Close match (synonym) or as a Related item (after which you'll have the option to designate the relationship as being "broader" or "narrower").

For this example, we'll pop up the Semantic Web dialog from the Close match toolbar of the Quercus item in the OCHRE Taxonomy and choose the Wikidata domain from the pick-list. When we click the Lookup item button, OCHRE substitutes the Name Quercus for the <Name> code in the query template, and fires off the SPARQL query. It uses the returned values to derive a descriptive string, and creates a list of items returned by the query.

Note that the SPARQL query can be designed to be case insensitive using a regex option, but we have noticed that this can adversely affect the performance of the query. Rather, we have configured OCHRE, by default, to query for both the item name as given ("Quercus") and also the lower-cased equivalent, if different ("quercus").

Examine the results returned from the query and select the item(s) you wish using the checklist provided.

Use the Done button to accept the selected items as Thesaurus links.

Viewing data from Semantic Web links

When you View items that have linked data from the Semantic Web, OCHRE composes the Entity URIs for the linked items and displays them as hyperlinks. Clicking on any of these links takes you to the source web page.

In addition, if there were images captured when the lookup-and-link was performed, OCHRE fetches those images on the fly and displays them in the View.

Other Examples

This is the OCHRE Location item of Cairo, having coordinates and an image looked up from Wikidata, and also linked to the authoritative Getty Thesaurus of Geographic Names.

This is the OCHRE Concept of the Obol, linked to and illustrated by the related entity from Nomisma, a trusted source of numismatic data from the ancient world.