By Sandra Schloen, February 2019
The Semantic Web is a set of services and standards, endorsed by the World Wide Web Consortium (W3C), that allows data to be published, shared, and accessed, by both computational processes and human users, within a common framework.
Many datasets of interest have been published on the Semantic Web as Linked (Open) Data and can be accessed by computational systems such as OCHRE. Such datasets will conform to RDF/XML standards, and will be described by a published schema or ontology. Access to data on the Semantic Web is through SPARQL (pronounced like "sparkle"), a special-purpose Query Language.
OCHRE's built-in mechanism for linking data across projects is the Thesaurus feature, and this naturally extends to integrating linked data from the Semantic Web. By using specially configured Derived Variables as described below, in conjunction with the Thesaurus, OCHRE lets projects link to content from the world wide web and integrate this content with core OCHRE data.
In this example, we are going to walk through the steps to configure a Derived variable called "Wikidata" to interact with the set of Wikipedia data published as linked open data and made accessible via SPARQL by Wikidata. We will then lookup "Oak", or rather the scientific term for it, "Quercus" and link it to OCHRE's Taxonomy value of "Oak," nested within the biological taxon hierarchy.
To begin, we create an Alphanumeric Variable, and designate it with the Derivation type of Semantic Web.
One of the core principles of the Semantic Web is that any item of interest, or any web "resource," can be uniquely identified by an http-based URI. For each domain being configured for OCHRE, we must give OCHRE the prefix of the "Entity URI" which OCHRE can use to derive the unique identifier of each item being looked up and linked.
The "Quercus" entity in Wikipedia's dataset, for example, is referenced by the following URI: https://www.wikidata.org/entity/Q12004
How would you know that "Quercus" was "Q12004" in Wikidata? You wouldn't!
And so we teach OCHRE how to look up content using SPARQL, first by giving OCHRE the SPARQL endpoint as documented by the domain of interest.
Next, call in a SPARQL expert to write the query syntax to fetch the data of interest.
Writing a SPARQL query template will require both expertise on the Query Language, and research as to the vocabulary (properties, entities, terminology) of the domain of interest. Many linked-data sites provide extensive documentation and a variety of examples. But it is up to someone who knows how to write SPARQL queries to prepare an appropriate query template that will be understood by OCHRE.
This is not the place to explain the syntax of SPARQL, except to say it is modeled on the principle of referencing a "triple" composed of a subject-predicate-object. E.g. find an item (subject) whose label (predicate) is "Quercus" (object).
There are several rules and options to be aware of when preparing a SPARQL template for OCHRE:
WHERE { ?item ? label "Quercus"@en .
Within OCHRE, all items from the Concepts, Locations & Objects, Periods, Persons & Organizations, and Taxonomy categories are enabled to integrate with Semantic Web data through their Thesaurus tabs.
Clicking on the Semantic Web icon from any Thesaurus links toolbar will pop up a window from which you can pick from among the available domains that have been configured for OCHRE. Note that you can link data either as a Close match (synonym) or as a Related item (after which you'll have the option to designate the relationship as being "broader" or "narrower").
For this example, we'll pop up the Semantic Web dialog from the Close match toolbar of the Quercus item in the OCHRE Taxonomy and choose the Wikidata domain from the pick-list. When we click the Lookup item button, OCHRE substitutes the Name Quercus for the <Name> code in the query template, and fires off the SPARQL query. It uses the returned values to derive a descriptive string, and creates a list of items returned by the query.
Note that the SPARQL query can be designed to be case insensitive using a regex option, but we have noticed that this can adversely affect the performance of the query. Rather, we have configured OCHRE, by default, to query for both the item name as given ("Quercus") and also the lower-cased equivalent, if different ("quercus").
Examine the results returned from the query and select the item(s) you wish using the checklist provided.
Use the Done button to accept the selected items as Thesaurus links.
When you View items that have linked data from the Semantic Web, OCHRE composes the Entity URIs for the linked items and displays them as hyperlinks. Clicking on any of these links takes you to the source web page.
In addition, if there were images captured when the lookup-and-link was performed, OCHRE fetches those images on the fly and displays them in the View.
This is the OCHRE Location item of Cairo, having coordinates and an image looked up from Wikidata, and also linked to the authoritative Getty Thesaurus of Geographic Names.
This is the OCHRE Concept of the Obol, linked to and illustrated by the related entity from Nomisma, a trusted source of numismatic data from the ancient world.