Since the early 2000s, I am involved in the agile software development movement.
Since 2013, I get sensible to the reproducible research question in scientific computing (see below).
From 2015 to 2017, I coordinated a group of people around the theme of practicals and tools for scientific computing [wiki] [Poster].
Reproducible research in scientific computing
Posters on the question
In 2016, a collective work was done to collect in a wiki practices and tools useful for scientific computing.
I presented a poster at the Journées de math-info de l'INRA, Pont Royal France, Oct. 3-7, 2016.
In July 2015, I updated my point of view to share it with a poster at the JDEV 15 conference.
The awareness that it is necessary to change seems more present even if practices change quite slowly. The amount of available tools (someone spoked of a 'huge tech soup', see Database of 400+ tools) surprises me, how to know and choose them.
Ensuring reproducibility requires work and ressources. More and more I think that individual movements are necessary but a change in research policies (evaluation of work, individuals ...) has also to occur.
I had the opportunity to synthetize my readings and point of view in a poster for the 2014 general meeting of MIA (Applied Mathematics and Computer Science) division at INRA.
Individual improvement in practices (linked to scientific environment, individual and neighborhood ability...) associated to changes in research environment can increase reproductibility in scientific computing. Practices can rely on several tools (generic or specific for a domain).
Introduction for a workshop
Comment être plus reproductible ? Journées bioinformatiques de l'INRA, Toulouse France, 22-24 mars 2016.
A list of available tools (last update in 2016)
Presenting the posters, I was asked the links to the project site of the mentioned tools, here they are.
Revision control software
git is a free and open source distributed version control system.
The most used version control system.
GNU Bazaar (formerly Bazaar-NG, command line tool bzr) is a distributed revision control system sponsored by Canonical.
Subversion (svn) is an open source version control system.
Not distributed as git and mercurial but still quite used.
Code repository
Literate programming
emacs Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.
IPython is a command shell for interactive computing in multiple programming languages, especially focused on the Python programming language, that offers enhanced introspection, rich media, additional shell syntax, tab completion, and rich history.
Sage is a free open-source mathematics software system licensed under the GPL. It builds on top of many existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers.
Workflow management system and provenance tracker
VisTrails: An open-source scientific workflow and provenance management system that supports data exploration and visualization.
Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds.
The kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler.
Galaxy is an open, web-based platform for data intensive biomedical research.
Sumatra is a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research. It can be thought of as an ''automated electronic lab notebook'' for simulation/analysis projects.
Environment capture
A Virtual machine (VM) is a software implementation of a machine (e.g., a computer) that executes programs like a physical machine.
Linux package: In Linux distributions, a package refers to a compressed file archive containing all of the files that come with a particular application. Most packages also contain installation instructions for the OS, as well as a list of any other packages that are dependencies (prerequisites required for installation).
docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux.
Vagrant is computer software that creates and configures virtual development environments. It can be seen as a higher-level wrapper around virtualization software such as VirtualBox, VMware, KVM and Linux Containers (LXC), and around configuration management software such as Ansible, Chef, Salt and Puppet.
Publication site
zenodo is a new simple and innovative service that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories.
DRYAD is an international disciplinary repository of data underlying scientific and medical publications. Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable.
Research Compendia: A web service allowing people to share the research software and data associated with a scientific publication (articles and working papers).
Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist's workflow.
Run&Share was a web service allowing people to run computer codes associated with a scientific publication (articles and working papers) using their own data and parameter values. It was a fork of RunMyCode.
myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities.