Documentation‎ > ‎

SPARQL

SPARQL Query Basics

Sparql queries can be performed on S3DB Data. You may either:
    http://mys3dbdeployment/s3db/rdf.php?key=[yourkey]&project_id=[project_id]

A few SPARQL engines that you may use are:
  • http://doapspace.org/sparql
  • http://www.sparql.org/sparql.html
  • http://demo.openlinksw.com/sparql/
Simply specify the address of your RDF document in the "FROM" clause.

You can then save this document and perform sparql queries against is by using your favorite SPARQL engine.

OR
  • Use the sparql endpoint function:
    http://mys3dbdeployment/s3db/sparql.php?key=[yourkey]&query=?a :R[rule_id] ?b .

In the latter case, queries are first serialized into their S3QL equivalent and run in parallel before its results are intersected using the PHP RDF API, developed  by (Oldakowski, et al., 2004).

Prefixes

By default, the S3DB SPARQL engine assumes the following prefixes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX s3db: <http://www.s3db.org/core.owl#>
PREFIX : [local s3db deployment URL]

If you want to add custom prefixes, add them to the beggining of the query; these will replace the default prefixes, therefore if you wish to use rdf or rdfs prefixes, they also need to be appended:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX custom: <http://mycustomprefix.com/>


Basic Select Query

A select query in SPARQL is very similar to a select query in SQL. The difference is that in SPARQL you can create named variables that you reuse throughout the query. So, for example, the SPARQL query:

?GenomicCharacterization rdf:type :C186 .


Creates a variable, ?GenomicCharacterization, and assigns to it everything of type ":C186". Since C186 is an S3DB Collection (see Core Model) and only S3DB Items can be of type S3DB Collection, this query effectively retrieves all Items of collection C186. 

You can also specify a "SELECT" clause in the query, in case you would like to control what gets returned:

SELECT ?GenomicCharacterization WHERE { ?GenomicCharacterization rdf:type :C186 . }

Go on, Try it yourself!

http://ibl.mdanderson.org/TCGA/sparql.php?format=html&query=?GenomicCharacterization rdf:type :C12456 .

(Remember to URL encode each query for programmatic access)


What queries can be asked?

In order to know what queries you can ask of the data, you need to know what data is there. Because S3DB separates the domain from the data, it is possible to write a SPARQL query asking just for those elements that represent sets of things. For example, how would you know what collections were available for query in the previous section?


Selecting Collections from a Project

To retrieve all Identifiers of the domain, you can perform the following query:

?D rdfs:subClassOf :P126 .

Both collections and rules are returned (identified by "C" or "R" in their IDS). To further trim the query in order to obtain only collections (and labels for them), you can add these triples:

?Collection rdfs:subClassOf :P126 .
?Collection rdf:type s3db:s3dbCollection .
?Collection  rdfs:label ?CollectionLabel . 

Try it!

http://ibl.mdanderson.org/TCGA/sparql.php?format=html.pretty&query=?Collection rdfs:subClassOf :P126 . ?Collection rdf:type s3db:s3dbCollection . ?Collection rdfs:label ?CollectionLabel .

CollectionCollectionLabel
http://ibl.mdanderson.org/TCGA/C12697s3dbVerb
http://ibl.mdanderson.org/TCGA/C44545Sample Collection Center
http://ibl.mdanderson.org/TCGA/C198Data Type
http://ibl.mdanderson.org/TCGA/C186GenomicCharacterization
http://ibl.mdanderson.org/TCGA/C127Platform
http://ibl.mdanderson.org/TCGA/C192Sample
http://ibl.mdanderson.org/TCGA/C131Manufacturer
http://ibl.mdanderson.org/TCGA/C12536Freeze
http://ibl.mdanderson.org/TCGA/C49942Analyte
http://ibl.mdanderson.org/TCGA/C1275311CancerGenomicCharacterizationAndSequencingCenters
http://ibl.mdanderson.org/TCGA/C75341GenomicCharacterizationRevisions
http://ibl.mdanderson.org/TCGA/C194Patient


The RDF for representing collections is descried in the following terms:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix s3db: <http://www.s3db.org/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://ibl.mdanderson.org/s3dbdemo/> .

:C186 a s3db:Collection .
:C186 rdfs:subClassOf :P126 ;
rdfs:label "GenomicCharacterization" ;
rdfs:comment "Unit of a high-throughput biological experiment." ;
dcterms:creator :U146 ;
dcterms:created "2008-04-09 17:19:07.926382" .

Selecting Rules from a Project

To select all Rules from a Project, it might be useful to know what those rules are concerned with. A rule is an attribute for a collection - for example, if the collection you are exploring is "GenomicCharacterization", then there might be a rule that formalizes the relationship between a "GenomicCharacterization" and a "RawData" file.

To retrieve all rules in the project, you can perform the following query:

?Rule rdfs:subClassOf :P126 .
?Rule rdf:type s3db:s3dbRule .
 
This does not help knowing much about the Rules, so let's ask for all the Rules concerning a given Collection:
?Rule rdfs:subClassOf :P126 .
?Rule rdf:subject :C186 .
?Rule rdf:object ?Object .

Try it!


RuleObject
http://ibl.mdanderson.org/TCGA/R191RawData
http://ibl.mdanderson.org/TCGA/R188Platform
http://ibl.mdanderson.org/TCGA/R188http://ibl.mdanderson.org/TCGA/C127
http://ibl.mdanderson.org/TCGA/R53730CancerGenomicCharacterizationAndSequencingCenters
http://ibl.mdanderson.org/TCGA/R53730http://ibl.mdanderson.org/TCGA/C1275311
http://ibl.mdanderson.org/TCGA/R75362ArchiveSerialIndex
http://ibl.mdanderson.org/TCGA/R3979Sample
http://ibl.mdanderson.org/TCGA/R3979http://ibl.mdanderson.org/TCGA/C192

Note that, if ?Object is a URI, then it corresponds to a Collection. You will only be able to query literals in the Rules of that Collection (see examples below).

Select All Items with a given attribute

Once the attributes are known, queries on the data can be performed. For example, to request all the ?GenomicCharacterization items where ?Platform used was manufactured by "Affymetrix":

?AffyItem :R149 ?platform FILTER regex(?platform, "^Affymetrix") .
?GenomicChar :R188 ?AffyItem .

And now let's filter those items to only those obtained at the Broad Institute

?AffyItem :R149 ?platform FILTER regex(?platform, "^Affymetrix") .
?GenomicChar :R188 ?AffyItem .
?GenomicChar :R53730 ?Institution .
?Institution :R1275327 ?BroadInst FILTER regex(?BroadInst, "broad", "i") .

Try it! 

Note: the "i" means case insensitive

See also for an example of a more complex query involving PREFIX and SELECT clauses.

Ċ
Lena Deus,
Nov 13, 2009, 7:21 AM
Comments