The end of your project is most likely when you will begin sharing and archiving your data. This checklist will help you, and others, get the most out of your data and code by making the shared and/or archived data as FAIR as possible.
Reasons why you might consider sharing your research data and code might include:
To allow other researchers to demonstrate the validity of your results, and reuse the methodology to get the same/similar results.
For the benefit of the research community and wider society.
Because your institution and funder require it.
Other researchers can use - and also check - the data, including yourself or colleagues in the future.
Leads to increased citations of your work.
Means that those involved in compiling the data / creating the code get credit for this.
Avoids duplication of work.
To improve your research profile and encourage collaboration.
To facilitate peer review.
Data that validates your research, especially if it has potential for reuse.
Would the data be hard to recreate (e.g. if the research process was expensive or related to a one-off event?) In such instances, there's a particularly strong reason to share.
Code created to process data, or details of proprietary software used.
Protocols.
Surveys (i.e. a blank version).
Participant information sheets and a copy of your blank consent form.
Tasks or other materials given to participants.
Other materials relating to the process of your research, in order to provide insights and 'open up' your research to others.
Store and share data for a minimum of 10 years after the end of the project, or in line with funder requirements.
Make sure you have the required permissions and consent to share your data and code. See the Planning your research page for guidance on this.
It's advisable that when sharing your data and code through a repository, you select one that provides you with a DOI. In some cases, your funder may mandate the depositing of data in a specific repository. Where this is not the case, we recommend using a subject-specific repository if there is one and you have access to it, but you can also choose to use a general repository. The University of Sheffield has a general, institutional repository, ORDA, that you could choose for your data and code. For details of other general repositories, see the Repositories section in this resource.
GitHub is not considered a repository as it does not provide your code with a DOI. While GitHub is a good tool for making your code open, storing your code in a repository alongside any other project research outputs will ensure it is located with any other relevant materials, improve the length of time it will be kept, and make it more findable. More details on depositing code can be found on our code page.
Information on some subject- and data-type-specific repositories, can be found on our subject-specific repository pages.
For reasons of accessibility, interoperability and longevity, it is a good idea to convert (or resave) your data to an open or more widely accessible format if they are currently in a specialised or proprietary format.
A quick fact sheet on open and accessible file formats can be found in the adjacent document:
When selecting a file format, you should also consider community standards for accessible sharing in your research area. If these deviate from standard practice, the reasons for this should be described in the accompanying metadata.
For more comprehensive guidance on file formats, see:
Recommended Formats - UK Data Service
Recommended Formats Statement - The Library of Congress
Choosing Formats - University of Cambridge, Research Data Management
Data formats for preservation - Openaire
Guidance on file formats - Digital Curation Centre (DCC)
Processes like anonymisation or de-identification may be needed to ensure that your data can be safely and legally shared. For more information and links to guidance, please see Sensitive data.
Metadata refers to the information you provide alongside your data when you deposit it, for example, in a data repository. Metadata literally means 'data about data'.
When you deposit data, you'll be asked to complete a simple webform that looks something like this (this is from the University of Sheffield's institutional repository, ORDA):
When you're completing this, think about how to make the information as useful as possible to another researcher, professional colleague or member of the public who is searching for a relevant dataset. This will usually mean being as precise and detailed as possible, and carefully considering the section that asks you to describe your data or code. Things you might consider including here are:
The research project the data relates to, and its aims and objectives.
When, how, and via what methodology (in brief) the data was collected.
What form the data takes.
Where more information about the context of the data can be found (i.e. in a README.txt file in the top level of your file structure).
Guidance on creating a README file is available on the During your project page.
When you upload data to a repository such as ORDA, you will usually be given a choice of licences under which to make your data available. A licence tells people how they can use your data, and options often include the Creative Commons licences. The licence you select for your data or code should be as open as possible but as restrictive as necessary.
You can find more information about the different licences on the Library webpages. Many researchers use the Creative Commons attribution licence (CC-BY) for their data, which allows any re-use of the work provided the original is appropriately cited.
There are also licences specifically for code, including MIT and Apache. See choosealicense.com for advice on software licences. Please note, Creative Commons licences are not suitable for code.
Where possible you should choose a repository that grants your deposited materials a DOI, most repositories will do this as standard but you may wish to check this when selecting a repository.
Once you have a DOI for your dataset or code - use it! Cite your DOI in publications and other outputs relating to your research project, include it in your ORCiD record and more. Publications derived from the data should include a Data Availability Statement containing your DOI.
All publications should include a Data Availability Statement that indicates where any underlying data/code materials can be found and preferably accessed. You can find more information about Data Availability Statements, plus useful examples, on the Library Research Data Management pages.
If you have used data in your research that is publicly and permanently available, you should share a link rather than sharing the actual data. See also the guidance on the below page: