IMPLEMENTATION MODEL

The Implementation Model describes the workflows, infrastructure, code, positions, specific duties, and services that make it possible to create and provide access to collections as data. Addresses how all of the above will be sustained post funding.

WHAT'S ON THIS PAGE:

Implementation Model: Workflows

Implementation Model: Infrastructures

Implementation Model: Code

Implementation Model: Positions, Specific Duties, and Services

Implementation Model: Workflows

The Linking Lost Jazz Shrines workflows fall into five general categories: oral history transcript preparation, identification and/or creation of items in Wikibase, preparation of transcripts in Sélavy, Omeka S metadata updates, and the creation of linked data triples. These workflows should generally be followed in this order, with the caveat that once Wikibase URIs are identified and/or created, transcripts can be prepared in Sélavy at the same time that Omeka S item metadata is updated.

Note: All workflows are predicated upon the existence of an agreed-upon linked data model or "ontology". Information about the data model used for the Linking Lost Jazz Shrines Project can be found here.

Oral History Transcript Preparation

1. Conduct and Transcribe Oral Histories
2. Format Transcripts (Sections I.A–I.D of Transcript Processing Instructions)
3. Identify Relevant Entities (Section I.E of Transcript Processing Instructions)
4. Add TXT Version of Transcript to GitHub (Section 1.F of Transcript Processing Instructions)
  - GitHub folder: https://github.com/weeksvilleheritagecenter/linking-lost-jazz-shrines/tree/master/data/formatted_transcripts

Identification and/or Creation of Wikibase Items

Identify Existing Entity Items in Wikibase
- Entities can be manually searched in the Wikibase
- For bulk data work, all of the person, music group, and music venue items from the Wikibase can be downloaded for comparison to the transcript entities.
- Future potential enhancement: Write python script compare transcript entities to existing Wikibase entities
Create New Entity Items in Wikibase
- Items can be added individually or in batch to Wikibase, with the assistance of the Semantic Lab at Pratt
- New Wikibase items must include "instance of" (person, music group, or music venue) and "part of project" (Linking Lost Jazz Shrines) statements
- Optional statements include Wikidata QID and/or Library of Congress authority ID
- Future potential enhancement: Write python script to create new Wikibase items (current script in code section is for adding statements to already existing items)

Omeka S Updates

Add Missing Linking Lost Jazz Shrines Entities as Subjects to Omeka S Item (Omeka S Items Documentation)
Add Wikibase URIs for Linking Lost Jazz Shrines Entities in Subject List (Omeka S Items Documentation)
Add Library of Congress URIs for non-Linking Lost Jazz Shrines Subjects (Omeka S Items Documentation)
Add Wikibase or Library of Congress URIs to Other Applicable Metadata Fields (Omeka S Items Documentation)
Add JSON of Omeka S Item Metadata to GitHub
- Process: From Omeka S Item page, (1) right click and select "View page source"; (2) copy all content between "<script type="application/ld+json">" and "/script>"; (3) paste into "pretty print" formatter (optional); (4) download JSON file; and (5) upload to GitHub
- GitHub folder: https://github.com/weeksvilleheritagecenter/linking-lost-jazz-shrines/tree/master/data/omeka_s_metadata
- Future potential enhancement: Write python script to extract linked data metadata through web scraping
Add CSV of Omeka S Item's Subjects to GitHub
- Process: manually "copy/paste" from Omeka S site
- GitHub folder: https://github.com/weeksvilleheritagecenter/linking-lost-jazz-shrines/tree/master/data/omeka_s_subject_lists
- Future potential enhancement: Write python script to parse Omeka S JSON Metadata (from either web or from JSON document) and output a CSV subjects

Prepare Transcript in Sélavy

Upload Transcript (Section II.A of Transcript Processing Instructions)
Clean and Block Transcript (Sections II.B and II.C of Transcript Processing Instructions)
Sent Transcript Text Through NER Service (Sections II.D and II.E of Transcript Processing Instructions)
Create rdf:types (documentation pending completion)
Create Identities (documentation pending completion)
Using Omeka S Subject Lists as Reference, Assign URIs and rdf:types to Identities (Sections II.F.4 and II.F.5 of Transcript Processing Instructions)

Creation of Linked Data

Generate Linked Data Triples
- Temporary Preparatory Process: Google Spreadsheets (documentation pending completion)
- Intended Long-Term Process: Sélavy (Future Work)
Export Linked Data Triples from Sélavy (Future Work)
Load Linked Data Triples into Triple Store (Future Work)
Integrate Linked Data into Visualization (Future Work)

Implementation Model: Infrastructures

Linking Lost Jazz Shrines has used the below technological infrastructures for exploring, creating, and supporting project data:

Omeka S (https://omeka.org/s/)

The Omeka S platform is used by the Linking Lost Jazz Shrines project for access and discovery of the project data, especially for the Weeksville Heritage Center community, and for those who may not have familiarity querying a Wikibase instance or accessing data via GitHub. Whereas Omeka Classic only provided text strings field types for metadata values, Omeka S allows for the inclusion of Uniform Resource Identifiers (URIs), the building blocks of linked data. Support for the WHC's Omeka S instance will continue beyond the Linking Lost Jazz Shrines project, as WHC uses Omeka S for numerous other initiatives and projects.

For the Linking Lost Jazz Shrines Omeka S site, WHC plans to install two modules: First the "Sharing" module (https://omeka.org/s/modules/Sharing), to allow for the sharing of Omeka S content via social media and the embedding of Omeka S content in other sites, and secondly the "Mapping" module (https://omeka.org/s/modules/Sharing; https://omeka.org/s/docs/user-manual/modules/mapping/), to enable the incorporation of geospatial data into the Omeka S site.

Wikibase (https://wikiba.se/)

Wikibase is a knowledge base software for linked data, and is most commonly known as the technological infrastructure that supports Wikidata. The Linking Lost Jazz Shrines project data is held in the Semantic Lab at Pratt's instance of Wikibase (base.semlab.io), which holds and will continue to hold all of the data for the various Semantic Lab at Pratt projects.

Wikibase Resources:

Wikibase Basic Install Tutorial: a video tutorial and written documentation for installing your own instance of Wikibase
Wikibase for Research Infrastructure: discussion of the Semantic Lab at Pratt's approach to using the Wikibase platform for the research infrastructure, starting with the Linked Jazz project data

DADAlytics: Sélavy (http://159.89.242.202:3000/)

The Semantic Lab at Pratt is developing DADAlytics, a tool comprised of a Named Entity Recognition (NER) Service and Sélavy, an application to create linked data triples from a textual document. The intention of the Linking Lost Jazz Shrines project was to use Sélavy to create linked data triples, and export data from Sélavy to be loaded into the Semantic Lab's Wikibase.

The project's testing the alpha version of Sélavy revealed that, while the tool is effective in generating batch triples where a subject is related to all the instances of an rdf type within a specific block of text, the tool does not yet have the capacity to create triples where a subject is related to specific object instances, a core need of the Linking Lost Jazz Shrines project. To this end, Linking Lost Jazz Shrines has utilized Sélavy to format transcripts, identify entities, and assign associated Wikibase URIs to those entities, but the project has not yet been able to use Sélavy to generate linked data triples. The continued development of Sélavy by the Semantic Lab at Pratt will enable the eventual creation of linked data triples for the Linking Lost Jazz Shrines project.

Sélavy Resources:

DADAlytics: Sélavy How To: a video demonstration of the early alpha version of Sélavy tool
Linking Lost Jazz Shrines Transcript Processing Instructions: includes screenshots and descriptions of Sélavy usage, as it pertains to the Linking Lost Jazz Shrines project

GitHub (https://github.com/)

GitHub has been utilized to store Linking Lost Jazz Shrines project data. Data in the project repository will continue to evolve as the project continues beyond its initial granting period, inclusive of the eventual linked data triples that will be created with the Sélavy tool, and any code that may be written to automate future data creation. As of September 2020, the Linking Lost Jazz Shrines GitHub repository contains the following datasets for each of the ten pilot transcripts:

Formatted transcripts (txt)
Sélavy entities, classes, and URIs (csv)
Omeka S subjects (csv)
Full Omeka S metadata (json)

Google Suite

While not novel or unique technology, the Linking Lost Jazz Shrines project would be remiss not to mention the utilization of Google suite tools for documenting the project, organizing the data, and preparing for the creation of linked data triples. Google Sites was used to create this documentation site, Google Drawings were used to visualize and communicate the enriched Linked Jazz ontology, Google Docs was used to keep running notes of project and to create the Transcript Processing Instructions document, and—perhaps most importantly—Google Sheets was used to organize project data, track updates for the various transcript processes, and preliminarily create linked data triples in preparation for the completion of Sélavy.

Implementation Model: Code

The Linking Lost Jazz Shrines Project itself has not yet generated any code related to its implementation model. The project did, however, use the below sets of code, written by Semantic Lab at Pratt co-director Matt Miller. Once Sélavy is upgraded so that specific linked data triples can be created within the application, code for (1) exporting linked data triples from Sélavy and (2) loading those triples into Wikibase will be written and tested by the Semantic Lab at Pratt.

Export Entities from Sélavy

Description: given a Sélavy document ID, export entity labels, rdf:types, and URIs from Sélavy
- README: https://github.com/SemanticLab/wikibase-load-scripts/blob/master/selavy/README.md
- Code: https://github.com/SemanticLab/wikibase-load-scripts/blob/master/selavy/export_entities.py

Load Data into Wikibase

- - Description: given a csv of Wikibase items (QIDs), properties (PIDs), and values, load data into Wikibase
  - README: pending
  - Code: https://github.com/SemanticLab/wikibase-load-scripts/blob/master/load_data.py

Implementation Model: Positions, Specific Duties, and Services

Note: The below positions, with their duties and services, are written from the perspective of a project born out of a partnership between two organizations. If this implementation model is replicated by an organization that does *not* require a partner collaborator, it is possible that some of the positions, duties, and/or services may not be needed. Additionally, especially for smaller institutions, it is possible that one person may have to fulfill more than one position. For Linking Lost Jazz Shrines, the Data Creator duties and services were completed collaboratively by the Project Lead/Content Domain Specialist, and the Linked Data Technical Consultant.

Project Lead/Content Domain Specialist

In accordance to the goals of the organization that holds the collections to be made in to data, the Project Lead/Content Domain Specialist oversees adherence to the project goals and timeline, and serves as the administrative liaison between the two collaborating organizations. In collaboration with the Linked Data Technical Consultant, the Project Lead/Content Domain Specialist will help guide and inform the development of the project ontology from the perspective of being true to the character and content of the collection to be transformed into data. The Project Lead/Content Domain Specialist will propose any additional classes and/or predicates (properties)—in the to be vetted by the Linked Data Technical Consultant and Linked Data Disciplinary Scholar—that will aide in fully expressing the collection content in a linked data format.

Linked Data Disciplinary Scholar

In collaboration with the Linked Data Technical Consultant, the Linked Data Disciplinary Scholar will help guide and inform the development of the project ontology, adhering to Tim Berners-Lee's linked data rules and 5-star rating system, and adjusting the ontology to the technical constraints of the selected linked data triple store or knowledge base that might impact data behavior. If a pre-existing ontology is enhanced for the project—as opposed to creating an ontology "from scratch"—the Linked Data Disciplinary Scholar will ensure that proposed classes and properties are integrated in a way that preserves internal logical consistency of the ontology.

Developer

If the Linked Data Technical Consultant or Data Creators require assistance, the Developer will assist in batch loading data in to the selected triple store or knowledge base, or in any process that might require moving large amounts of data. If utilizing the same infrastructure as the Linking Lost Jazz Shrines project, the developer will install and manage a Wikibase instance, and will also install and manage an Omeka S instance, inclusive of installing any additional modules on each platform required for the project.

Linked Data Technical Consultant

If the project is a collaboration between two organizations, the Linked Data Technical Consultant is the liaison between the organization holding the collection that will created as linked data and the organization that may provide ontological and/or technical support for the creation of that linked data. The Linked Data Technical Consultant negotiates the project ontology with both the Project Lead/Content Domain Specialist and the Linked Data Disciplinary Scholar to ensure the ontology meets both the technical and content requirements of both parties. Through close communication with the Project Lead/Content Domain Specialist, the Linked Data Technical Consultant will establish workflows for processing collection documents, creating linked data, and storing that linked data.

Data Creators

Data Creators implement the workflows established by the Linked Data Technical Consultant to create linked data out of oral history transcripts. These duties include, but are not limited to, identifying pertinent entities in transcripts, identifying and/or creating items in Wikibase, updating subject lists in Omeka S with Wikibase and/or Library of Congress URIs, formatting and uploading transcripts in Sélavy, associating entities in Sélavy with their respective Wikibase URIs, and (ultimately) creating linked data triples within Sélavy. If needed, the Data Creators will collaborate with the Linked Data Technical Consultant and/or Developers to create Wikibase items en mass and load resultant linked data triples into the Wikibase. Tasks listed in the "Workflow" section of this page constitute additional duties for which Data Creators may be responsible.

Page updated

Report abuse

IMPLEMENTATION MODEL

WHAT'S ON THIS PAGE:

Implementation Model: Workflows

Oral History Transcript Preparation

Identification and/or Creation of Wikibase Items

Omeka S Updates

Prepare Transcript in Sélavy

Creation of Linked Data

Implementation Model: Infrastructures

Omeka S (https://omeka.org/s/)

Wikibase (https://wikiba.se/)

DADAlytics: Sélavy (http://159.89.242.202:3000/)

GitHub (https://github.com/)

Google Suite

Implementation Model: Code

Export Entities from Sélavy

Load Data into Wikibase

Implementation Model: Positions, Specific Duties, and Services

Project Lead/Content Domain Specialist

Linked Data Disciplinary Scholar

Developer

Linked Data Technical Consultant

Data Creators

Contact: semanticlab@pratt.edu • Site last updated: March 2021