Share & Publish
In the context of RDM, data sharing refers to the practice of publicly sharing data from completed (parts of) research, i.e. outside your project or research team. It is different from exchanging data with collaborators while your research is active. Ghent University
John MacInnes Sharing data
2 June 2014
Why share & publish data?
Making your finalised data (or snapshots of your data) available to others has a number of benefits, including:
Increasing transparency of your research. Supports validation and replication of research Read more: data validation
Accelerating scientific discovery by enabling new (types of) research. Promotes new discoveries Read more: data discovery
Enhancing the visibility and impact of your research. Read more: cite data & citation analysis
Creating new opportunities for collaboration. Read more: research collaboration
More and more publishers, funders and institutions require research data, especially data resulting from publicly funded research, to be shared where possible. Read more: funders & publishers
Reduces redundant research
Ensuring data are accessible, understandable and reusable
Source: Ghent University, University of Pittsburgh, University of Oxford & UK Data Service
Plan to share data & publish data
Plan ahead to create high-quality, shareable research data
In research projects, early planning is essential to ensure that activities are considered in detail and are organised, to ensure efficiency and successful completion of the work.
The same applies to the planning of how research data will be managed over the length of a research project and beyond. In this digital age, most research projects are data centric and therefore research needs to be planned around the data. A data management plan is therefore the ideal planning tool for researchers. Source: UK Data Service
Degrees of data sharing
Questions to consider:
How will you share the data?
How will potential users find out about your data?
With whom will you share the data, and under what conditions?
Will you share data via a repository, handle requests directly or use another mechanism?
When will you make the data available?
Will you pursue getting a persistent identifier for your data?
Are any restrictions on data sharing required?
What action will you take to overcome or minimise restrictions?
For how long do you need exclusive use of the data and why?
Will a data sharing agreement (or equivalent) be required?
Who will be responsible for data management?
Sharing research data is not an all-or-nothing choice, but a spectrum. It ranges from making data fully open on one end, to keeping them fully closed on the other, with various possible forms of restricted/controlled access in-between. Ghent University
Open Research data
Data that can be 'freely used, modified and shared by anyone for any purpose' (opendefinition.org).Ghent University
Closed data
Data that are temporarily under embargo, or that cannot be shared at all. Ghent University
Restricted data
Data that are not shared in a fully open way, but made available under more restricted access and use conditions. This means that there are limits on who can access and use the data, how, and/or for what purpose.
Data repositories can offer the possibility to deposit your data under restricted/controlled access. Ghent University
Restrictions on data sharing
Research data cannot always be shared (immediately) in a fully open way. Sometimes they can only be made available under more restricted conditions and/or after an embargo period, or – in some circumstances – not even at all.
Possible reasons for restricting the sharing of data are:
Personal data
Confidential data
Sensitive data
Data protected by copyright and/or database right of which you are not the owner.
Data with commercial/economical potential
'As open as possible, as closed as necessary'
Which level of sharing you should choose largely depends on what is appropriate given the nature of your data, and on how well you planned for data sharing (e.g. so that you have the right permissions/consent in place, when applicable). Ghent University
Ways of sharing data
Consideration needs to be given to
does the publisher require authors to archive the data supporting their results in an appropriate public archive?
archiving the data in the institutional repository and/or another repository?
is the publication archive efficient and cost-effective in the use of public funds?
does the archive support all data formats?
how will the data be managed?
will the data be open or closed to the public?
what data citation method will be used?
In principle, there are various ways of sharing data beyond your project or research team, each with their pros and cons:
Email research data upon request to peers and colleagues
Make research data available via a personal or project blog/website
Add research data as supplementary materials to peer-reviewed journal articles - traditional or Open Access publishing
Publish in a data journal
Share data via a data discipline specific repository/archive
Share data via an institutional research data repository
The latter option, i.e. sharing data via a data repository, is preferred, as it offers many benefits for researchers, the scientific community and society at large. It is the best option for ensuring that data are accessible in a sustainable manner.
Keeping data findable, understandable and effectively reusable requires some preparation and effort on your part (i.e. keeping files organized, documentation and metadata, and having the access rights and reuse permissions in place).
FAIR data principles
What is FAIR and why is it important?
The FAIR Data Principles were developed to guide researchers in the process of making data findable (data can be discovered by others), accessible (data can be made available to others), interoperable (data can be integrated with other data) and reusable (data can be reused by others).
The goal of applying the FAIR Data Principles is to enable and enhance the reuse of data (and other digital objects), by both humans and machines.
Funders, publishers and policy makers also encourage the generation of FAIR data
Under the fair dealing concept, data can be copied for non-commercial teaching or research purposes, private study, criticism or review without infringing copyright, provided that the owner of the work is sufficiently acknowledged. UK Data Service
Findable
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data they describe
F4. (Meta)data are registered or indexed in a searchable resource
Accessible
Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available
Interoperable
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data
Reusable
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
FAIR-Aware helps you assess your knowledge of the FAIR Principles, and better understand how making your data(set) FAIR can increase the potential value and impact of your data.
The tool is discipline-agnostic, making it relevant to any scientific field. You can use this tool at any point during your research before depositing your data(set) in a data repository. It is also good to keep in mind that many FAIR-related decisions can already be made in the research planning phase, so you may want to use FAIR-Aware early on to help you make those decisions. Also, if you are a trainer, you can use FAIR-Aware to assess the knowledge of FAIR of your course participants.
FAIR concepts
Machine-readability or actionability
Machine-readability or actionability enables machines (e.g. scripts, software, algorithms) to read, understand and process the data and aggregate data from different sources, types and disciplines. As such, it can allow research at a much larger scope, scale and speed, often needed in contemporary science.
For instance, if the (meta)data are machine-readable, machines will be able to locate a digital object, identify the type of digital object (is it a dataset or a publication? does it contain experimental data or simulation data?) and determine whether it is usable with respect to accessibility, license, data format or other use constraints. Ghent University
Persistent identifiers (PIDs) and globally unique identifiers (guid, uuid)
A PID, such as a DOI, PURL, or Handle, is a long lasting reference to a digital object. PIDs avoid broken links and difficulties to locate a dataset that is e.g. underlying a journal article. A PID uniquely identifies the digital object and ensures that it can always be located, even if its web address (URL) changes. A PID can be used for data citation (e.g. ORCID). Ghent University
Digital Object Identifier (DOI)
The Digital Object Identifier or DOI is a commonly used identifier for research datasets. It is generated by the central registries e.g. DataCite & CrossRef. A DOI always comprises:
A prefix: ‘10.’ + 4 or more numbers: identifies the organisation that registered the DOI
A suffix: identifies the dataset.
An example for a dataset held at the Dryad repository is: https://doi.org/10.5061/dryad.4h16331. Ghent University
By clicking on a DOI link, you will be taken to the current URLs related to a single resource.
Metadata
Metadata are data about data. Data must be formatted, described and cleaned to ensure that other researchers will find the datasets useful and understandable.
Metadata are a structured and machine-readable form of documentation and are key to making data FAIR.
Metadata are managed by data repositories to enable you to search and filter the data. Moreover, online search engines can harvest (i.e. automatically collect) and index (i.e. restructure to speed up searches) metadata to enable searches across data repositories e.g. through Google or through data portals.
Controlled vocabulary, taxonomy, ontology
There are many different ways in which you can describe your data. Terminology might be ambiguous (e.g. the word “root” has a different meaning in biology and maths). Moreover, terminology might be highly domain-specific and therefore difficult to understand.
A controlled vocabulary can help to restrict the terminology that you are using to describe your data to previously defined terms. In taxonomies and ontologies, relations and/or semantics are added to the terms to increase the structure and expressiveness of the controlled vocabulary. For instance, geoNames can be used for geospatial semantic information, where the country name “France” will be connected to info such as the continent it is part of, ISO abbreviation for the country, used languages, etc.).
Using controlled vocabularies will improve the discovery (e.g. because different spelling is avoided), linking, understanding and reuse (e.g. because data can be aggregated more easily) of the data. Ghent University
Authentication and authorization
Authentication: the identity of the user will be verified.
Authorization: it will be verified whether the user has access to specific data, applications or files. Ghent University
Licensing data
Many kinds of data created as part of a research project are subject to the same rights as literary or artistic work. Such items acquire rights like copyright or more general Intellectual Property rights when they are created. This gives the rights owner control over the exploitation of their work, such as the right to copy and adapt the work, the right to rent or lend it, the right to communicate it to the public and the right to license and distribute. These rights need to be taken into account when creating, using and sharing data. UK Data Service
What is copyright, who owns it and how long does it last? Copyright is an intellectual property right assigned automatically to the creator. It prevents unauthorised copying and publishing of an original work. Copyright applies to research data and plays a role when creating, sharing and reusing data. UK Data Service
Most research outputs, such as spreadsheets, publications, reports and computer programs, fall under literary work and are therefore protected by copyright. Facts, however, cannot be copyrighted. UK Data Service
When making research data publicly available, it is important to let potential users know in advance what they are allowed to do with those data. Licensing is an effective way to communicate such permissions.
A trusted data repository will normally apply a license to any dataset it holds, which you typically select (from a list of options) when depositing data. Ghent University
Open research data
Good practice is to apply a standard and open license for open research data, as it ensures legal interoperability and the widest possible reuse.
Among the standard licenses commonly used for research data is the suite of Creative Commons (CC) licenses, which offer different levels of permission. Ghent University
Restricted data
For data requiring access restrictions, a standard license is usually not appropriate. In such cases a bespoke license will be needed instead (e.g. an ‘end user license’ or ‘user agreement’ as implemented by a trusted data repository) to make the data available. Ghent University
Citing data
Citing a dataset correctly is just as important as citing articles, books, images and websites – each dataset is a source of evidence to support your argument. UK Data Service
Research data can be cited in the same way as publications.
A data citation should contain the following minimum elements:
Author (creator of the dataset)
Publication date
Title
Version (if applicable)
Publisher (the organisation hosting/distributing the dataset, i.e. the repository)
Identifier
This unit outlines the benefits and challenges associated with sharing research data openly.
After completing this unit you will:
▪ Be informed about the benefits and barriers to sharing research data.
▪ Understand the principles of open research data.
▪ Know about the FAIR principles and how to share your research data in a way that is FAIR.
▪ Recognize why you might choose to license your dataset, and the different types of open data licence that are available.
▪ Be aware of the role of data access statements when publishing the results of your research.