Resources‎ > ‎

Related Efforts

Links to other initiatives, efforts, organizations and opportunities of relevance to Force11

Compiled by David Shotton/Anita de Waard/
participants of the Beyond the PDF and Force11 workshops.

This listing is not comprehensive, and is
at present heavily biased both towards European initiatives and towards the Life Sciences. 
Because of this, we seek input of additional useful information to be added to this compilation, which should be submitted by adding a Comment at the foot of the page. 
This compilation of related efforts is a living document that will expand over time.

Items are grouped alphabetically under headings.

Academic reward system

  • AltMetrics Manifesto.  Describes the role for new measures of scientific impact based on web activity.
  • Philip E. Bourne (2011). Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia. PLoS Comput Biol 7(1): e1002001. doi:10.1371/journal.pcbi.1002001

Activities, initiatives and workshops

Author Identification:


Authoring Tools:

Biological data repositories and databases  (VERY incomplete list)

Blogs with interesting things to say of relevance to Force11

Computational Linguistics/Text Mining Efforts

  • AcroMine (NaCTeM, University of Manchester).  Automtically determines the full forms of acronyms.
  • Argumentational Zoning, work by Simone Teufel and others:
  • Automatic recogntion of sentence types in biomedical abstracts. Tsujii lab, University of Tokyo. Title, conclusion, method, objective, result. See MEDIE (advanced search) for a demo.
  • GENIA (Tsujii Lab, University of Tokyo) and GREC (NacTeM, University of Manchester).  Textual corpora anotated with biomedical events – permit system training to identify and structure relevant information in biomedical documents automically.
  • Hypothesis identification at Xerox.  The Xerox Integrated Parser is used to find key retorical statements in biology research papers.
  • In-Context Summaries.  The work of Stephen Wan of CSIRO, Sydney, providing in-browser summaries of referenced papers, weighted by the textual context of the in-text reference.
  • Linking of biomedical Named Entities in document to related database entries - such links are provided in the BioLexicon. Examples of search engines providing such links are MEDIE and UKPMC.
  • Metaknowledge annotation of biomedical events. NaCTeM, University of Manchester.
    • Annotation of interpretative information for biomedical events along 5 different dimesions: Knowledge Type (fact, analysis, observation, etc), Certainty Level, Polarity, Manner and Source.
  • OpenCalais.  A web service provided by Thompson-Reuters that creates semantic markup of submitted text.  Good for terms relating to current events, commerce and politics.  Weak for scientific terms.  Check conditions of use – Thompson-Reuters retains text submitted for its own purposes!
  • REFLECT.  European Molecular Biology Laboratory, Heidelberg.  A Web service that provides semantic markup for gene and protein names in submitted HRML documents, with links to relevant bioinformatics databases.
  • U-Compare.  An integrated text mining/natural language processing system based on the UIMA Framework, allowing documents to be processed by various text-mining tools.

Data citation


  • For some really useful articles on this issue from someone who does understand typography and design see Craig Mod's site, for example this one on Books:
  • Or this on how the reading experience should work:

Funding opportunities and funders that particularly supporting research in the Force11 area

Government reports, public consultations and speeches relating to scholarly communication and open data

Hypothesis/claim-based representation of the rhetorical structure of a scientific paper

These projects all start with the assumption that a scientific paper is, at heart, a persuasive text that makes a number of claims, that are backed by research data and references. The paper comprises a set of hypotheses supported by evidence in the form of included data or references to other work.

  • aTags.  DERI, 2009- now.
    aTags ("associative tags") are snippets of HTML that capture the information that is most important to you in a machine-readable, interlinked format. aTags works with any Web text and can store and connect any textual element that is highlighted in a browser. 
  • Cohere.  KMI, 2007- now.
    The Cohere project, which builds on the earlier 'ClaiMaker' project, offers a web-based interface to create claims, hypotheses, or statements, and relate these to other claims using an open set of relationships. It is usable for science, but also for structuring online debateson other topics. 
  • Hypotheses in Biology.   UvA, 2009
    A methodology and set of proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance.
  • HyBrow.  Stanford, 2008.
    A prototype bioinformatics tool for designing hypotheses and evaluating them for consistency with existing knowledge
  • HypER.  2009 – now.
    HypER is an ad-hoc group of researchers who all represent scientific communications as a set of hypotheses, with relations to evidence. It includes representatives of LiquidPub, Cohere, SWAN, SALT, SPAR, aTags and abcde work. The main focus of HypER has shifted to the W3C HCLS work on Scientific Discourse structures.
  • SALT. DERI, 2008.
    SALT is a LaTeX-based authoring tool that allows authors to identify Rhetorical Structure Theory (RST-) relations between sentences in their paper. It offers the author the opportunity to define main and secondary (satellite) sentences and create relations between them.
  • SWAN.  Alzheimer's Network, Harvard IIC, 2006 – now.
    The SWAN project adds a collection of hand-curated hypotheses to a research paper, which are then related through a set of discourse relationships. They can be browsed and relations between claims, as well as support networks for a specific claim, are made and visualised. 

Metadata standards and ontologies

Mapping initiatives between ontologies:

SWAN/SIOC/CiTO alignment. 20010, HCLS SiG of W3C.
Harmonization and alignment btween three ontology systems of relevance to citations and rhetorical relationships between publications:
SWAN, used for the SWAN project,  SIOC, used to describe social media, and the SPAR ontologies CiTO (Citation Typing Ontology) and FaBiO (FRBR-aligned Bibliographic Ontology).

Modular formats for science publishing

These propose greater granularity for the scientific paper, the 'smallest publishable unit' being smaller than the size of a full paper.

  • abcde format.  Utrecht University, 2007.
    The abcde format is a proposal for a simple, structured format for conference papers in computer science, based on LaTeX. Each paper consists of three sections: Background, Contribution, and Discussion, and three added elements: A = Annotation, Dublin Core annotation; E = Entities, these are RDF-formatted entities of interest, including references, and (no contribution to the acronym) Core Sentences: these are sentences that are marked up by the author to be core elements. They can be extracted to form a structured abstract.
  •  'Coarse-grained rhetorical structure'.  Work done in the HCLS SiG of the W3C since 2009. This group aims to define a 'rhetorical structure' for scientific papers, to use in authoring or mark-up tools. They are trying to come to a definition of such a format; have an intermediary proposal of their own and are beginning to make an overview of existing publisher's proposals.
  • LiquidPub. EU Project, U Trento and others (2008- 2011) A 'liquid' format for science papers is proposed, that consists of a set of research objects, connected by links.
  • Modular Physics Paper.  University of Amsterdam (1999). A modular form for Physics papers: by investigating a collection of papers, a more fine-grained structure for science papers and an extensive relationships taxonomy is proposed
  • Nanopublications.  NBIC, the Netherlands Bioinformatics Centre.
    The notion of a 'nanopublication' is basically a general scientific assertion, written using semantic-web standard formats with additional meta-data concerning provenance. 
    The Concept Web Alliance proposes to model scientific research as sets of triples  (
    CWA Nanopublications, 2010).

    he definition of the format has been published (The Anatomy of a Nanopublication).

    See also
    The Value of Data, motivating the use of nanopublications in Nature Genetics.

Open Citations

  • The Open Citation Corpus. University of Oxford, 2010 onward.
    A public RDF triplestore of biomedical literature citations encoded as Open Linked Data, linked using CiTO, the
    Citation Typing Ontology.  Encoding references to some 3.4 million to unique papers, representing >20% of all PubMed Central papers published between 1950 and 2010, including all the most highly cited papers in every biomedical field.  Citation data freely available under a CC0 waiver from in a variety of formats including RDF and BibJSON.   Hopefully soon to include data citations from the Dryad data repository.

Organizations and projects:

Other  (incomplete list):

Policy documents


A key part of science is knowing the provenance of a paper, experiment, data item, etc. Provenance includes attribution, sources, experimental workflow, citations and quotes, i.e. who, what, when where, why.

Publications and reports relevant to scholarly digital publication and data

The Scholarly Electronic Publishing Bibliography presents over 3,800 articles, books, and a limited number of other textual sources that are useful in understanding scholarly electronic publishing efforts on the Internet. It covers digital copyright, digital libraries, digital preservation, digital rights management, digital repositories, economic issues, electronic books and texts, electronic serials, license agreements, metadata, publisher issues, open access, and other related topics.

  • Geoffrey Boulton, Michael Rawlins, Patrick Vallance, Mark Walport (2011). Science as a public enterprise: the case for open data.  The Lancet, Volume 377, Issue 9778, Pages 1633 - 1635, 14 May 2011. doi:10.1016/S0140-6736(11)60647-8
  • Liz Lyon (2010). Open science in the data decade - article in Issue 20 of the Central Government edition of Public Service Review. publications.html#central-government-2010-04
  • Liz Lyon (2007). Dealing with Data: Roles, Rights, Responsibilities and Relationships - Consultancy Report.
  • O'Donnell RP, Supp SR, Cobbold SM.  (2010).  Hindrance of conservation biology by delays in the submission of manuscripts.  Conserv. Biol. 24 (2): 615-620. Epub 2010 Jan 11.
  • Open Biology.   The Royal Society has just launched Open Biology, its first fully open access journal. Open Biology is a rapid, open-access, peer-reviewed online journal publishing high quality research in cell biology, developmental and structural biology, molecular biology, biochemistry, neuroscience, immunology, microbiology and genetics.  The Editor-in-Chief, Professor David Glover (FRS) from the University of Cambridge, aims to provide a journal with a fair and speedy review system, run by active, practicing scientists with high expertise in this area, allowing good papers to be published quickly. 
  • Sommers J (2010).  The delay in sharing research data is costing lives. Nature Medicine 16 (7): 744.

Publishers active in data publication

o   Semantic tagging of and enhancements to published texts:

o   Data publishing:

o   Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples. Zookeys 50: 1–16, doi:10.3897/zookeys.50.538.

o   Streamlining taxonomic publication: a working example with Scratchpads and ZooKeys.
Zookeys 50: 17–28, doi:10.3897/zookeys.50.539.

o   Taxonomy shifts up a gear: New publishing tools to accelerate biodiversity research.
Zookeys 50: i–iv, doi:10.3897/zookeys.50.543.

Semantic publishing initiatives and other enriched forms of publication

  • Adventures in Semantic Publishing.  Oxford Uuniversity, 2009
    A paper reporting a manually marked-up version of an epidemiplogical research paper in PLoS Neglected Tropical Diseases, with data enhancements, better browsing, reference linking and citation typing.
  • Article of the Future.  Cell, 2009 onwards.
    Tabbed and hyperlinked presentation of the article; Graphical Abstract and Highlights on the landing page
  • Open Access journals published by Pensoft Journals come with semantic enhancements. Example: PhytoKeys.
  • Project Prospect. Royal Society of Chemistry, 2009 onwards.
    RSC editors annotate compounds, concepts and data within the articles and linking these to additional electronic resources such as biological databases.
  • Semantic Biochemical Journal. 2010 onwards.
    Using Utopia, an innovative PDF reader, this allows enrichment of the PDF with interactive figures and active data.

 Structured Digital Abstracts - modeling science (especially biology) as triples

Representing scientific information as sets of triples. There is a special interest in this representation within biology and life sciences. Some intiiatives include:

  • FEBS Letters SDA, 2008 – now.
    The journal FEBS Letters adds curator-created triples to describe protein-protein interaction to every appropriate paper.
  • The Structured Digital Abstract, Seringhaus/Gerstein, 2008.
    This paper basically proposes to include a 'structured XML-readable summary of pertinent facts'.

Structured experimental methods and workflows

  • crowdLabs: a platform for sharing and executing computational tasks.
  • Investigation/Study/Assay (ISA).  European Bioinformatics Institute and University of Oxford, 2009 – present.  The ISA infrastructure is a general-purpose format and freely available desktop software suite targeted to curators and experimentalists that assists in management of experimental metadata, engages with minimum information checklists, ontologies and formats, perticularly relating to genomics data for submission to international public repositories (e.g. ENA for genomics, PRIDE for proteomics and ArrayExpress for transcriptomics).
  • Knowledge Engineering from Experimental Design (KEfED).
    A structured way of constructing 'observational assertions' based on statistical relationships from experiments. The model is general-purpose and forms a basis for reasoning over experimental data. 
  • My Experiment.
    A platform to create and exchange experimental workflow components.
  • VisTrails: an open-source data analysis and visualization tool that supports the creation of documents whose results have deep captions that point to their provenance, and thus can be reproduced and verified. Provenance-rich results derived by VisTrails can be included in  LaTeX, Wiki, Microsoft Word and PowerPoint documents.

Web services