Home‎ > ‎

openurl

Most of the following I learnt while writing textseeka, which I did myself, although I did have some help:


Apart from using openurl for providing the "best copy" available for any given reference, (e.g. journal article, patent, book or book chapter), openurl-type resolving offers lots of possibilities, including, as I recently discovered, serving up archival images.

If you're implementing an OpenURL resolver, then you probably want to consider:

  • access points:
    • applications (EndNote, Zotero, etc)
    • browsers (Firefox/Internet Explorer OpenSearch plugins for searching using a DOI/PMID/other identifier)
    • web forms
    • COinS in a web page (CiteULike, Wikipedia)
    • databases (Web of Knowledge, PubMed, etc)
  • request types:
    • openurl (including slight variants such as COinS requests, which use slightly different tags)
    • openref (see Rod Page's examples)
  • resources:
    • bibliographic items - just journal articles, book chapters, books, or patents and other formats as well?
    • pictures - JP2 - JPEG2000 (djakota)
    • information such as the Biodiversity Library output from this request
  • output formats:
    • HTML (XML + XSL?)
    • AJAX: XML; JSON
    • RDF/XML
Web services are great:
  • get an OCLC developer key or whatever they call it, and make use of xISSN and xISBN services. If you're taking incoming requests from all sorts of vendors, then be prepared for ISSNs from former titles being used for the continuing title! The OCLC xISSN service provides continuing titles data - very handy for this problem.
  • CrossRef do a great job - some people don't like their preference not to use ISSNs which I completely understand. Make sure to use the unixref option though - there's some nice metadata that way. If I was to start over, I'd use xpath with SimpleXML... much easier that way from what I've seen. CrossRef resolves forwards and backwards from metadata to a DOI. I've learnt that caching this data is fine too - see Chuck Kocher's response re:caching #Crossrefs unixref response: http://bit.ly/17MWcg
  • Publishers often provide a DOI resolver of their own, you can't resolve it to metadata but it provides a stable link with that publisher - useful if their openurl resolver doesn't work
  • PubMed - Alf Eaton has a nice simple php script (I'd still use curl though for these and other simple services, for simple proxy support). NLM do a great job with their web services - they've gone to version 2 since I started using their data.
  • Biblio.net - not using it yet, but worth investigating for book data, and possibly journal titles too.
  • OPS - Open Patent Services are great if you're dealing with patents... I haven't found a way to get the WIPO data, but Espace does a pretty good job.

ISSN data:

If you're doing ISSNs, you might need a look-up table in a SQL database(/RDF store?). These sources of ISSN data might well be useful for such a thing:

Rules-based processing of openurl targets (normally journals).


My PHP implementation notes

Multiple-keys - with PHP, I quickly learnt that you need to parse inline requests with your own string parser, as PHP5's parse_str will happily overwrite earlier name=value pairs when a matching name comes along. The multiple-keys point is especially important with the info parts of an incoming request, and when these are url-encoded you can have a PMID and DOI in the same request, and give me the PubMed data over CrossRef anyday.

If you're taking XML requests, (especially POSTed data), then I'd use xpath... I initially used XSL for xml data, and then SimpleXML, but I think SimpleXML + xpath would be smarter.

Basically, the approach I used was parse to explode the inline request, (querystring), on the & and then each pair on the = , and store the values in an English-named array (some of those OpenURL names are weird). If you're using arrays, then you can't use rft. (use rft_ ) as you would with OpenURL 1.0 - I chose to rip this off the incoming request, as this also makes them more compatible with 0.1 requests - see info for SFX api here. Be careful of faulty keys too - there can be spaces inserted in keys too ("e page" is an old Springer example).

Some values would end up generating more keys in the array, for example, the presence of an ISSN would make the request a journal article, as it's best to use duck typing (most vendors can't generate proper OpenURL requests - since when were patents journal articles?).
Comments