Experience

Timeline

80's, markup ➥ 90's, web ➥ 00's, linking ➥ 10's, linked data


Interests

I have been working with web technologies for more than a couple decades now. Early on I tracked the first developments in XML and then also in RDF, especially for information modelling with schemas and constraints. This led on to developing standards for network identifiers for linking and persistence, and then to creating web frameworks for search and resource discovery. More recently, my focus has been on building knowledge graphs and publishing datasets as linked data.


Skills

Databases

  • MarkLogic, MySQL, Postgres

Graphstores

  • 4store, GraphDB, Jena (TDB), Stardog

Languages

  • Elixir, JavaScript, Perl, Python, Ruby, (Unix) Shell, XSLT
  • (some knowledge of) C, Java, Lisp, Pascal, PostScript, Prolog, Smalltalk

Learning

  • Erlang, Rust

Markup

  • CSS, HTML, JSON, (La)TeX, Markdown, RSS, SGML, XML

Querying

  • GraphQL, SPARQL, SQL, XQuery

Semantics

  • OWL, RDF, RDFS, SHACL, SKOS, SKOS-XL, SPIN

Tools

  • GitHub, Jupyter, PoolParty, Protégé

Validation

  • DTD, SHACL, XSD

Technologies

Ontologies

Over the last 7 years I co-developed an OWL ontology for an STM publisher knowledge graph starting from an early RDFS schema. A key design decision was to build a consistent internal data model for what subsequently evolved into Springer Nature SciGraph with transformations to well-known models to be made at data publishing time. I developed some initial transforms to other well-known models, notably Schema.org as a metadata lingua franca and Bibframe for library collections, but currently only the internal SciGraph model has been published.

Taxonomies

I have worked with SKOS taxonomies, especially from a data management point of view rather than as a specific subject matter expert. My interests were to produce and maintain robust schemes that could be published and reused. We used PoolParty to manage our taxonomies.

Triplestores

For SciGraph we used Ontotext GraphDB following an initial project start with Stardog. (Prior to that I had used Apache Jena TDB for internal work and a hosted 5store as a public SaaS solution after trialling 3store, 4storeand Virtuoso.)

Rules

I have had many years experience of using SPARQL both for ad hoc querying of RDF datasets and for stored queries, especially embedded within SPIN rules, and more recently SHACL rules. I have also worked with GraphDB native rulesets to better manage the inference process for our needs.

Search

For SciGraph we used Elasticsearch for text retrieval and API lookups, and Kibana for analytics dashboard prototyping. In earlier projects we used Marklogic for searching an XML corpus, and Lucene for indexing RDF data.

ETL

For SciGraph we used an ETL pipeline managed by Airflow. Master datasets were sourced variously and stored (and versioned) in AWS S3 with subsequent transforms to RDF applied by ETL tasks.