Code/Software
Material in this section of this resource draws substantially on work completed by Denis Simsek and Zuzanna Zagrodzka for the FAIR disciplinary guidance project.
FAIR Code/Software
At the policy level, code is considered part of FAIR, with the European Commission expert group on FAIR data stating that “Central to the realisation of FAIR are FAIR Digital Objects, which may represent data, code or other research resources” (Lamprecht et al., 2020). Applying the FAIR principles to research code will provide similar benefits, such as enabling transparency, reproducibility and reusability of research that facilitate efficient access to code-based knowledge by industry, science, education and society.
Below are some tips and guidance on how to do this so your code meets the FAIR principles where possible.
‘We take the FAIR approach because we rely on open source software in both our teaching and research activities. Therefore, we want to become part of it.’
Professor Haiping Lu, Department of Computer Science
Translating the FAIR principles to code
Findability
1. Create a description of your code with metadata and ontologies
The following tools may be useful in achieving this:
Codemeta is a set of keywords used to describe code and how to structure them in a machine readable way
Edam is an example of an ontology that provides terminology that can be used to describe bioinformatics code; however, there may be a more suitable one for your area of research.
2. Register your code in a code registry
You can place your code in a code registry and associate it with yourself and your research team. Some example of code registries include:
3. Get and use a unique and persistent identifier for your code
Giving your code a DOI will mean it can be cited in publication and other communications in order to open up your research to others and invite collaboration, as well as ensuring a constant link to your code. This might be achieved, for example, by exporting a version of the code to a repository (such as ORDA) in order to obtain a DOI. This can be cited in publications and other outputs.
Accessibility:
In order to ensure that others can access and download your code, and that this access remains permanent over time, you should deposit your code in a repository. You can find more information about repositories using the link below or on the Libraries Research Data Management page.
Interoperability:
1. Explain the functionality of your code.
Ensuring that people know what your code does and can therefore use it correctly themselves will enable it to be re-used more, and more effectively. Where possible, use terms from a domain ontology (such as EDAM for bioscientific data) standard across your area. You should also ensure that your code is well documented and commented, in an easy to understand manner.
2. Use standard (community agreed) formats for inputs and outputs that allow data exchange between different pieces of code.
Where possible and applicable, outputs (even between pieces of code) should use open and accessible data formats, which will help if other researchers only wish to use part of your code. However, if there are other, more bespoke, inputs/outputs, these should be as standard and widely used as possible.
For example:
FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences that has become the community/universal standard, (although there is currently no standard file extension and could be; .fasta, .fna, .ffn, .faa, .frn, .fa.)
NetCDF is a standard file format used for sharing of array-oriented scientific data.
You should avoid defining your own standards.
Reusability:
1. Document your code
Documentation should not only describe the code and its purpose, but also how it should be installed and run, as well as describing any extra inputs or setup that need to happen. For further guidance, see this beginner’s guide to writing documentation.
2. Apply an appropriate licence to your code
This makes clear to other researchers the ways in which they can reuse your code.
If you have third-party dependencies in your code, you must ensure licence compatibility.
Or this tool can help you if you want to choose an open source licence.
3. State how to cite your code
You can make it easier for people to give you credit for your work by clearly stating how the code should be cited. This could be done as part of the documentation, e.g. by creating a citation.cff file, which there are tools to help you create. Additional information is also available on why and how to cite and describe your code.
4. Follow best practices for code development to improve quality
Where possible, make your code modular.
Comment your code to make it as clear as possible.
Create and provide tests that others can use.
Follow code standards.
Use version control.
There are many useful guidance documents for helping researchers to follow best practice; some of these are listed below:
The eScience Centre Guide best practice guide.
Best Practices for Scientific Computing published in PLOS Biology.
Conversely, there's Good enough practices in scientific computing.
Code Repositories and Platforms
Code Repositories:
ORDA: The University of Sheffield repository, allowing researchers to deposit data, code and other materials; you can also get a DOI assigned for your code.
Zenodo: Allows the depositing of research papers, datasets, research code, and reports; and assigns a DOI for your code.
PyPI: A repository specifically for packages for the Python programming language.
The Comprehensive R Archive Network (CRAN): A network of ftp and web servers around the world that store identical, up-to-date versions of code and documentation for R.
Docker Hub: A hosted repository service for finding and sharing container images.
A note on GitHub:
An important element of making your code FAIR is to apply a unique persistent identifier (i.e. a DOI), which is different to a link to GitHub. Within ORDA, you can link data from GitHub on the MyData page using this icon:
Alternatively, there is a GitHub guide for using Zenodo to archive a GitHub repository.
Advantages of version control:
Identifiers that point to earlier/different versions of your code will still work.
Maintaining previous versions of code and data, which also helps in the exact recreation of outputs if required
Sharing various levels of processed data (primary, secondary, raw/clean/processed, etc.).
Version control and collaboration platforms:
GitHub: A cloud-based Git platform that allows developers to host, monitor, and version control code changes. It has also evolved to become a development platform. It gives developers the option to implement apps and integrations freely through the GitHub marketplace.
GitLab: A cloud-based Git and DevOps platform that helps developers monitor, test, and deploy their code.
Bitbucket: A version control repository hosting platform. It allows unlimited private repositories that can be useful for prototyping projects, before they are ready to be made open.
Choosing the right code hosting platform from these may seem daunting, but all offer good elements and are a better choice than using none at all. There is plenty of guidance available for selecting the best one to fit your needs, including direct comparisons between gitlab and github, and github and bitbucket.
Sources and useful links:
Top 10 FAIR data & code things: Biomedical Data Producers, Stewards, and Funders
Lamprecht Anna-Lena et al. ‘Towards FAIR Principles for Research Software’
From FAIR research data toward FAIR and open research software
Four simple recommendations to encourage best practices in research software
File naming - 8-minute video with guidance on file naming and version control, especially relevant for code developers (taken from https://www.data.cam.ac.uk/support/external)
Good practice: 4 Simple recommendations for Open Source Software