Since the early 2000s, I am involved in the agile software development movement.

Since 2013, I get sensible to the reproducible research question in scientific computing (see below).

From 2015 to 2017, I coordinated a group of people around the theme of practicals and tools for scientific computing [wiki] [Poster].


Reproducible research in scientific computing

Posters on the question

poster2016

In 2016, a collective work was done to collect in a wiki practices and tools useful for scientific computing.

I presented a poster at the Journées de math-info de l'INRA, Pont Royal France, Oct. 3-7, 2016.

poster2014

In July 2015, I updated my point of view to share it with a poster at the JDEV 15 conference.

The awareness that it is necessary to change seems more present even if practices change quite slowly. The amount of available tools (someone spoked of a 'huge tech soup', see Database of 400+ tools) surprises me, how to know and choose them.

Ensuring reproducibility requires work and ressources. More and more I think that individual movements are necessary but a change in research policies (evaluation of work, individuals ...) has also to occur.

poster2015

I had the opportunity to synthetize my readings and point of view in a poster for the 2014 general meeting of MIA (Applied Mathematics and Computer Science) division at INRA.

Individual improvement in practices (linked to scientific environment, individual and neighborhood ability...) associated to changes in research environment can increase reproductibility in scientific computing. Practices can rely on several tools (generic or specific for a domain).

Introduction for a workshop

Comment être plus reproductible ? Journées bioinformatiques de l'INRA, Toulouse France, 22-24 mars 2016.


A list of available tools (last update in 2016)

Presenting the posters, I was asked the links to the project site of the mentioned tools, here they are.

Revision control software

git is a free and open source distributed version control system.

The most used version control system.

Mercurial is a free, distributed source control management tool.

An alternative to git.

GNU Bazaar (formerly Bazaar-NG, command line tool bzr) is a distributed revision control system sponsored by Canonical.

Subversion (svn) is an open source version control system.

Not distributed as git and mercurial but still quite used.

Code repository

logo

GitHub is a web-based hosting service for software development projects that use the git revision control system.

logo

SourceForge: Find, Create, and Publish Open Source software for free.

logo

BitBucket: Free source code hosting for Git and Mercurial.

logo

SourceSup: French Forge for Training and Research Public Institutions.

Literate programming

logo

Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change.

logo

knitr: The notebook interface allows to write and run code, display 2d and 3d plots, and organize and share your work.

Elegant, flexible and fast dynamic report generation with R

logo

emacs Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.

logo

IPython is a command shell for interactive computing in multiple programming languages, especially focused on the Python programming language, that offers enhanced introspection, rich media, additional shell syntax, tab completion, and rich history.

logo

Sage is a free open-source mathematics software system licensed under the GPL. It builds on top of many existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers.

Workflow management system and provenance tracker

VisTrails: An open-source scientific workflow and provenance management system that supports data exploration and visualization.

Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.

logo

The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds.

logo

The kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler.

logo

Galaxy is an open, web-based platform for data intensive biomedical research.

logo

Sumatra is a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research. It can be thought of as an ''automated electronic lab notebook'' for simulation/analysis projects.

Environment capture

A Virtual machine (VM) is a software implementation of a machine (e.g., a computer) that executes programs like a physical machine.

Linux package: In Linux distributions, a package refers to a compressed file archive containing all of the files that come with a particular application. Most packages also contain installation instructions for the OS, as well as a list of any other packages that are dependencies (prerequisites required for installation).

logo

docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux.

logo

Vagrant is computer software that creates and configures virtual development environments. It can be seen as a higher-level wrapper around virtualization software such as VirtualBox, VMware, KVM and Linux Containers (LXC), and around configuration management software such as Ansible, Chef, Salt and Puppet.

Publication site

logo

figshare: Manage your research in the cloud and control who you share it with or make it publicly available and citable.

logo

zenodo is a new simple and innovative service that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories.

logo

DRYAD is an international disciplinary repository of data underlying scientific and medical publications. Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable.

Research Compendia: A web service allowing people to share the research software and data associated with a scientific publication (articles and working papers).

logo

Dataverse: A web site dedicated to sharing, archiving and citing research data.

logo

Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist's workflow.

logo

Run&Share was a web service allowing people to run computer codes associated with a scientific publication (articles and working papers) using their own data and parameter values. It was a fork of RunMyCode.

logo

myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities.

logo

recomputation.org: It was a repository for experiments in computational science.