Pillars of the Open Science according to UNESCO's 2021 Open Science recommendation.
For the results of computations to be meaningful, they must be reproducible—when anyone applies the same analytical techniques to the same data, they must get the same results. For other researchers to be able to verify that a result is reproducible, and build on the result and techniques used to obtain it, the data and code used must be shared openly.
In this unit you will learn ways to ensure that your results are reproducible, and how to apply the principles of open science to build on others’ work and help them build on yours.
Beyond the Essentials, in this unit we assume knowledge of a programming language, and will use examples specific to Python. If you don’t already know a programming language, then we would recommend working through Software Carpentry’s Introduction to Python before diving in.
Green and Gold open access
Creative Commons Licensing
The arXiv
Institutional policies and repositories
Rights retention
Version control
GitHub
Issue tracking
Pull requests
Automated testing
Agile techniques
Introductions to Git
Repository hosting platforms:
Software licensing
Repositories for open workflows
When to combine, or separate, data and code
Disposable vs reusable tools
Types of reproducibility
Precisely specifying hardware
Environment specification – containers and alternatives
Ensuring reproducibility with random numbers
Persisting RNG state
Recording provenance of field configurations
Field configuration and configuration metadata storage formats
Tracking provenance of measurements
Data formats for measurement inputs and outputs
Tracking provenance of analyses
Data formats for analysis inputs and outputs
Code structure
Workflow and data-flow management
Removing manual steps
Workflow managers
Libraries for statistics in lattice
Outputting tables
Styling plots
Keeping data consistent
Maintaining reproducibility when working with notebook-based programs
Removing manual steps
As this is a relatively young and rapidly-developing field, there are few textbooks but many online resources available.
Irving, et al.’s Research Software Engineering in Python (free online, also available in print) provides a good well-rounded introduction to many of the principles. Many chapters of this borrow from some of the lessons above.
You should also check with your institution’s library for guidance on your institution’s policies and procedures around open research.