Experience

Timeline 
80's, markup  ➜  90's, web  ➜  00's, linking  ➜  10's, linked data

Interests 
I have been working with web technologies for over a couple decades now. Early on I tracked the initial developments in XML and then in RDF, especially for information modelling with schemas and constraints. This led on to developing standards for network identifiers for linking and persistence, and then to creating web frameworks for search and resource discovery. More recently, my focus has been on building knowledge graphs and publishing linked data.

Skills
 Databases  4store, GraphDB, Jena (TDB), MarkLogic, MySQL, Stardog 
 Languages   JavaScript, Perl, Python, Ruby, (Unix) Shell, XSLT
 (some knowledge of) C, Java, Lisp, Pascal, PostScript, Prolog, Smalltalk 
 Markup  CSS, HTML, JSON, (La)TeX, Markdown, RSS, SGML, XML 
 Querying  SPARQL, SQL, XQuery
 Semantics  OWL, RDF, RDFS, SHACL, SKOS, SKOS-XL, SPIN
 Tools  GitHub, Jupyter, PoolParty, Protégé
 Validation  DTD, SHACL, XSD
   
 Learning  Elixir, Erlang, GraphQL


SciGraph data model


Ontologies
Over the last 7 years I co-developed an OWL ontology for an STM publisher knowledge graph starting from an early RDFS schema. A key design decision was to build a consistent internal data model for what subsequently evolved into Springer Nature SciGraph with transformations to well-known models to be made at data publishing time. I developed some initial transforms to other well-known models, notably Schema.org as a metadata lingua franca and Bibframe for library collections, but currently only the internal SciGraph model has been published.

Taxonomies
I have worked with SKOS taxonomies, especially from a data management point of view rather than as a specific subject matter expert. My interests were to produce and maintain robust schemes that could be published and reused. We used PoolParty to manage our taxonomies.

Triplestores
For SciGraph we used Ontotext GraphDB following an initial project start with Stardog. (Prior to that I had used Apache Jena TDB for internal work and a hosted 5store as a public SaaS solution after trialling 3store, 4store and Virtuoso.)

Rules
I have had many years experience of using SPARQL both for ad hoc querying of RDF datasets and for stored queries, especially embedded within SPIN rules, and more recently SHACL rules. I have also worked with GraphDB native rulesets to better manage the inference process for our needs.

Search
For SciGraph we used Elasticsearch for text retrieval and API lookups, and Kibana for analytics dashboard prototyping. In earlier projects we used Marklogic for searching an XML corpus, and Lucene for indexing RDF data.

ETL
For SciGraph we used an ETL pipeline managed by Airflow. Master datasets were sourced variously and stored (and versioned) in AWS S3 with subsequent transforms to RDF applied by ETL tasks.