Provisionally collected in this Etherpad
Breakout Group: Research Objects of the Future
What is a 'research object' :
What is a 'nanopublication' :
The central focal point of this discussion was focussed on nanopublications as a computable representation of scientific assertions with added provenance information. We consider 'Research objects' (attempting to follow on from the definition provided in the MyExperiment link above) to act as holders of several data artifacts (that may or may not have semantic structure; including PDF files, excel spreadsheets, text files, etc.) as well as any nanopublications that we might add to the data as annotations. Note that this action of annotating non-semantic elements of a research object with nanopublications provides us with an incremental way of adding our additional content to existing data.
Preliminary requirement specifications
As the basic building block of our representation, we adopt a nanopublication as a 'molecule of scientific knowledge'. It should possess the following capabilities:
- It must have provenance
It must be possible to cite it / link to it.
- well-defined versioning is an aspect of this property
It can be used as an annotation of:
- a possible mechanism for this could be assigning a unique id to every single independent assertion
- this citation provides a mechanism for provenance
It must be discoverable so that search engines can find it.It must be persistent under a well-defined contract
- text in a paper (HTML, XML, PDF, etc.),
- text in a wiki page or any other variant (knowlet, etc),
- data elements in a file,
- tuples in a database
- any other resource included in a research object.
It must be possible to assign credit for separate individuals' contributions to the creation of the nanopublication.There will be different types of nanopublication.
- thus it may not be necessary for nanopublications to live forever, but we do need to know how long they will live for.
A nanopublication may contain and be contained by other nanopublications
- (a first step of this work is to figure out what these different types are).
- Therefore, claims, assertions, descriptions, interpretations, etc. are all simply different types of nanopublication.
- based on degrees of specificity (either generalizing across domains of specializing within a domain).
- based on its purpose within the logical structure of the reasoning framework
A nanopublication must be 'computable' Nanopublications expressed in different frameworks must be interoperable.
- this may depend on its purpose
- this may be simply managed by a 'part-of' relation
At present, the initial technical framework for developing this approach includes:
The KEfED modeling approachThe annotation ontologyCITO and other ontologies from David Shotton's group.Annotation methods
- The existing definitions of nanopublications defined by Barend Mons / Paul Groth.
- SWAN / SIOC 's representation of argumentation elements and their extensions with OBI elements
Other possible elements of interest
- Utopia Documents
- Annotation framework provided as part of the AO.
- OBO-Foundry ontologies
- Chemistry XML
- NIF terminologies / Systems
- Bio2RDF open linked data
- Open Provenance Model recommendations
Tim Clark is the task coordinator for Scientific Discourse at the W3C Semantic Web for Health Care and Life Sciences Interest Group (
Barend Mons' effort with the Concept Web Alliance is a large-scale effort incorporating work with a lot of people. Can we work with them to move this forward as well?
Challenge Problems / Domains
Maryann Martone's work on SMA
Phil Bourne's Drugome work
Anita de Waard's life-science papers
It was also suggested that we should tackle some non-biological domains: possibly Physics or Computer Science.
General Methodological Approach
We have a pre-existing general framework for this work already: existing scientific computational elements and their roles within a scientific argument. This gives us a preliminary set of types for nanopublications and a way of scoping different participants' interests. We should use this framework to place people, for example: Yolanda Gil is only really interested in describing computational workflows thus she would be interested in nanopublications describing the computational methodology of a paper; Barend Mons is more interested in high-level interpretations of papers.
- universals, broad-based statements that are widely accepted as 'true'
- cardinal assertion nanopublications (e.g., malaria is caused by mosquitoes).
- widely accepted statements that are qualified for specific contexts / models
- based on lots of supporting evidence
- aggregated interpretative assertion nanopublications (e.g. 'CA1 projects massively to the entorhinal cortex in rats')
'citences' (i.e. citation sentences, after Marti Hearst)
- describe the high level findings of a single study, the main punchlines of a paper.
- based on supporting evidence from a single study
- interpretive assertion nanopublications (e.g. 'a DNAse hypersensive site was identified in the vicinity of exon 1')
- Sentences in a document that cite a finding from another document (usually Introductions, Related Works or Discussion sections).
- citation nanopublications (e.g. 'Recently, CCr3 has been shown to be upregulated on neutrophils and monocytoid U937 cells by interferons in vitro and to be expressed by endothelial cells, epithelial cells and mast cells [11-16]').
- what is the underlying reasoning model of the work?
- theory nanopublication
- the scientific protocol used. What was done?
- simple statement of narrative based on material entities, material processing, assays, information entities, and data transformation. Could include experimental design elements.
- experimental design nanopublication (also equivalent to a KEfED model).
- The main data findings of a study
- must be linked to experimental design
- experimental observations nanopublication
conference article / poster
- all the data from a given experiment
- could be equivalent to tuples from an experimental database or a LIMS system
- data-set nanopublication
- a lightweight publication of preliminary findings
- use the same nanopublications as above but less concrete, lower values for reliability
- informal ongoing work within a given project
- use the same nanopublications as above but much less concrete and probably not published
- throwaway assertions, unqualified ides,
- nanopublications you might keep as part of a hypothetical set for brainstorming.
- hypotheses and planned knowledge that has not yet been found to be true.
- Organize efforts around W3C group
- Provide specific guidelines to collaborators to enable them to conform to recommendations
- Provide executive summary to Phil.
- Add guidelines to manifesto.
- Get buy in from community about this conceptual framework
Breakout Group: Writing
The writing group decided that we should aim for developing a common tool (or set of tools), rather than to work on several independent projects, or just a set of standards, protocols, etc. The requirements for the tool we want to develop are:
- help with a major bottleneck in creating scholarly content
- be as simple as possible (low hanging fruit, try to solve 80% of the use cases, go for the long tail in small labs, etc.)
- have a working prototype no later than in 6 months
- be useful for all scholarly disciplines
Although we identified the process of manuscript submission to a journal as one major bottleneck, the group agreed that a tool for better creation, annotation, storage and reuse of research objects would be the most valuable contribution we could make. (Raw notes from Thursday
show our original discussion items and interests.)
We discussed several tools that do something like this, including
- FASCINATOR (Sefton): http://ptsefton.com/2009/06/12/desktop-repositories-smashing-up-powerpoint.htm
- NEPOMUK: http://nepomuk.semanticdesktop.org
- ADMIRAL (Shotton): http://imageweb.zoo.ox.ac.uk/pub/2009/admiral/ADMIRAL_Project_Case_for_Support.pdf
More work is required to understand what solutions are available, including tools not written specifically for researchers that can be adapted for this purpose.
The group plans to start the discussion about the overall design, data formats and protocols on Friday. Two requirements for the new tool that we discussed are that it should facilitate the writing process based on these research objects and that it should work with institutional and data repositories (e.g. http://datadryad.org/
), and journal submission systems.
The group agreed that the first group of users for the tool should be undergraduate and graduate students, and that members of the writing group would select a small group of their students as "beta testers" of the new system.
Please add your name to subprojects you want to be involved in.
- Survey of existing tools (is there already an 80% solution?)
Martin Fenner and Alex Garnett, see also here
- Packaging Data formats
to be formalized by David Shotton
- Automated tools: annotation, classification
Martin Haye and Karmel Allison
- Mentoring (collaring students, helping them up, etc.)
Phil Bourne, Jonathan Eisen, Pat Brown - see also here
- Usability (easy - use tools people already have?)
- Plugins - identify one or a few initial ones; framework?
- Clients that create research objects
Please add your name
- Container tools that reuse research objects (-> publication)
- Metadata/annotations creation UI
Jeff Beck - to bring metadata fields from another similar project
- Coordination with Research Objects of the Future Group
Please add your name
Reading/Writing system Principles:
- Let's not write any code that runs on the client (except in a browser).
- Develop the spec and a reference implentation at the same time.
- The earlier one starts tripling the data, the easier it should later be to publish that package (or paper, or whatever we call it then)
- Link to diagram: http://twitpic.com/3s3qqc and full-resolution: http://dl.dropbox.com/u/6923768/BTPDFdiagram.JPG
- Start work in subprojects
- Phone conference in coming weeks to plan next steps, coordinated by Peter Murray Rust