Notes from the breakout sessions

Provisionally collected in this Etherpad

Breakout Group: Research Objects of the Future 


What is a 'research object' :

What is a 'nanopublication' 

The central focal point of this discussion was focussed on nanopublications as a computable representation of scientific assertions with added provenance information. We consider 'Research objects' (attempting to follow on from the definition provided in the MyExperiment link above) to act as holders of several data artifacts (that may or may not have semantic structure; including PDF files, excel spreadsheets, text files, well as any nanopublications that we might add to the data as annotations. Note that this action of annotating non-semantic elements of a research object with nanopublications provides us with an incremental way of adding our additional content to existing data. 

Preliminary requirement specifications

As the basic building block of our representation, we adopt a nanopublication as a 'molecule of scientific knowledge'. It should possess the following capabilities:
  1. It must have provenance 
    • well-defined versioning is an aspect of this property
  2. It must be possible to cite it / link to it.
    • a possible mechanism for this could be assigning a unique id to every single independent assertion
    • this citation provides a mechanism for provenance 
  3. It can be used as an annotation of:
    1. text in a paper (HTML, XML, PDF, etc.),
    2. text in a wiki page or any other variant (knowlet, etc), 
    3. data elements in a file, 
    4. tuples in a database 
    5. any other resource included in a research object.  
  4. It must be discoverable so that search engines can find it.
  5. It must be persistent under a well-defined contract
    • thus it may not be necessary for nanopublications to live forever, but we do need to know how long they will live for.
  6. It must be possible to assign credit for separate individuals' contributions to the creation of the nanopublication.
  7. There will be different types of nanopublication.
    • (a first step of this work is to figure out what these different types are).
    • Therefore, claims, assertions, descriptions, interpretations, etc. are all simply different types of nanopublication.
    • based on degrees of specificity (either generalizing across domains of specializing within a domain). 
    • based on its purpose within the logical structure of the reasoning framework
  8. A nanopublication may contain and be contained by other nanopublications 
    • this may depend on its purpose
    • this may be simply managed by a 'part-of' relation
  9. A nanopublication must be 'computable' 
  10. Nanopublications expressed in different frameworks must be interoperable.
Suggested Architecture

At present, the initial technical framework for developing this approach includes: 
  1. The existing definitions of nanopublications defined by Barend Mons / Paul Groth.
  2. SWAN / SIOC 's representation of argumentation elements and their extensions with OBI elements
  3. The KEfED modeling approach
  4. The annotation ontology
  5. CITO and other ontologies from David Shotton's group.
  6. Annotation methods 
    • Utopia Documents
    • Annotation framework provided as part of the AO.
  7. Other possible elements of interest
    • OBO-Foundry ontologies
    • Chemistry XML
    • NIF terminologies / Systems
    • Orcid
    • Bio2RDF open linked data
    • Open Provenance Model recommendations
    • ???
Community S
trategy for Development

Tim Clark is the task coordinator for Scientific Discourse at the W3C Semantic Web for Health Care and Life Sciences Interest Group ( and has generously invited us to use this framework to manage this effort going forward. 

Barend Mons' effort with the Concept Web Alliance is a large-scale effort incorporating work with a lot of people. Can we work with them to move this forward as well?

Challenge Problems / Domains
  • Maryann Martone's work on SMA
  • Phil Bourne's Drugome work
  • Anita de Waard's life-science papers 
    It was also suggested that we should tackle some non-biological domains: possibly Physics or Computer Science.

General Methodological Approach

We have a pre-existing general framework for this work already: existing scientific computational elements and their roles within a scientific argument. This gives us a preliminary set of types for nanopublications and a way of scoping different participants' interests. We should use this framework to place people, for example: Yolanda Gil is only really interested in describing computational workflows thus she would be interested in nanopublications describing the computational methodology of a paper; Barend Mons is more interested in high-level interpretations of papers.
  1. 'textbook' 
    • universals, broad-based statements that are widely accepted as 'true'
    • cardinal assertion nanopublications (e.g., malaria is caused by mosquitoes).
  2. 'review paper'
    • widely accepted statements that are qualified for specific contexts / models
    • based on lots of supporting evidence
    • aggregated interpretative assertion nanopublications (e.g. 'CA1 projects massively to the entorhinal cortex in rats')
  3. 'abstract' 
    • describe the high level findings of a single study, the main punchlines of a paper.
    • based on supporting evidence from a single study
    • interpretive assertion nanopublications (e.g. 'a DNAse hypersensive site was identified in the vicinity of exon 1')
  4. 'citences' (i.e. citation sentences, after Marti Hearst)
    • Sentences in a document that cite a finding from another document (usually Introductions, Related Works or Discussion sections).
    • citation nanopublications (e.g. 'Recently, CCr3 has been shown to be upregulated on neutrophils and monocytoid U937 cells by interferons in vitro and to be expressed by endothelial cells, epithelial cells and mast cells [11-16]').
  5. 'theory' 
    • what is the underlying reasoning model of the work?
    • theory nanopublication
  6. methods
    • the scientific protocol used. What was done?
    • simple statement of narrative based on material entities, material processing, assays, information entities, and data transformation. Could include experimental design elements.
    • experimental design nanopublication (also equivalent to a KEfED model).
  7. results
    • The main data findings of a study
    • must be linked to experimental design
    • experimental observations nanopublication
  8. 'data paper'
    • all the data from a given experiment
    • could be equivalent to tuples from an experimental database or a LIMS system
    • data-set nanopublication
  9. conference article / poster 
    • a lightweight publication of preliminary findings
    • use the same nanopublications as above but less concrete, lower values for reliability
  10. blog 
    • informal ongoing work within a given project
    • use the same nanopublications as above but much less concrete and probably not published
  11. tweet
    • throwaway assertions, unqualified ides,
    • nanopublications you might keep as part of a hypothetical set for brainstorming.
  12. proposal
    • hypotheses and planned knowledge that has not yet been found to be true.
Next steps 
  • Organize efforts around W3C group
  • Provide specific guidelines to collaborators to enable them to conform to recommendations
  • Provide executive summary to Phil.
  • Add guidelines to manifesto.
  • Get buy in from community about this conceptual framework 
Breakout Group: Writing

The writing group decided that we should aim for developing a common tool (or set of tools), rather than to work on several independent projects, or just a set of standards, protocols, etc. The requirements for the tool we want to develop are:
  • help with a major bottleneck in creating scholarly content
  • be as simple as possible (low hanging fruit, try to solve 80% of the use cases, go for the long tail in small labs, etc.)
  • have a working prototype no later than in 6 months
  • be useful for all scholarly disciplines 
Although we identified the process of manuscript submission to a journal as one major bottleneck, the group agreed that a tool for better creation, annotation, storage and reuse of research objects would be the most valuable contribution we could make. (Raw notes from Thursday show our original discussion items and interests.)

We discussed several tools that do something like this, including
  • FASCINATOR (Sefton):
  • ADMIRAL (Shotton):
More work is required to understand what solutions are available, including tools not written specifically for researchers that can be adapted for this purpose.

The group plans to start the discussion about the overall design, data formats and protocols on Friday. Two requirements for the new tool that we discussed are that it should facilitate the writing process based on these research objects and that it should work with institutional and data repositories (e.g., and journal submission systems.

The group agreed that the first group of users for the tool should be undergraduate and graduate students, and that members of the writing group would select a small group of their students as "beta testers" of the new system.

Please add your name to subprojects you want to be involved in.

  • Survey of existing tools (is there already an 80% solution?)
    Martin Fenner and Alex Garnett, see also here
  • Packaging Data formats
    to be formalized by David Shotton
  • Infrastructure 
    Martin Haye
  • Automated tools: annotation, classification
    Martin Haye and Karmel Allison
  • Mentoring (collaring students, helping them up, etc.)
    Phil Bourne, Jonathan Eisen, Pat Brown - see also here
  • Usability (easy - use tools people already have?)
    Daniel Mietchen
  • Plugins - identify one or a few initial ones; framework?
    Peter Murray-Rust
  • Clients that create research objects
    Please add your name
  • Container tools that reuse research objects (-> publication)
    Martin Fenner
  • Metadata/annotations creation UI
    Peter Sefton
    Jeff Beck - to bring metadata fields from another similar project
  • Coordination with Research Objects of the Future Group
    Please add your name

Reading/Writing system Principles:
  • Let's not write any code that runs on the client (except in a browser).
  • Develop the spec and a reference implentation at the same time.
  • The earlier one starts tripling the data, the easier it should later be to publish that package (or paper, or whatever we call it then)
  • Link to diagram: and full-resolution:
Next steps
  • Start work in subprojects
  • Phone conference in coming weeks to plan next steps, coordinated by Peter Murray Rust