Blog‎ > ‎

Open-Source Science Opens Doors

posted Apr 6, 2012, 11:54 AM by   [ updated Feb 7, 2013, 8:25 AM ]
A recent article in Nature (Ince et al. 2012) reignited my passions for open-source science.  As science becomes increasingly data and model driven, the standards established during the last century need to be revisited, revised, and/or rejected.  Those standards - detailed methods presented in the peer-reviewed literature - are no longer adequate in the age of data/model driven science.  The problems are 1) our models and data-processing algorithms have become extremely large and complex, and 2) it has become increasingly difficult to publish longer-form articles in high quality journals. The second problem is well beyond my capacity to change (or even fathom) and therefore, I will focus on the first.

The first and foremost question is:  Is it science if it’s not reproducible?  Many would argue no.  Or isn’t it?  Often times ecologists (and sociologists, evolutionary biologists, economists, etc.) have only single events from which to test their hypotheses.  Take the example of the Biscuit fire in southern Oregon.  It was a rare occurrence from which much useful knowledge was derived.  But it is not reproducible.  These data are always going to be special circumstances and one hopes that over the years, enough such data is collected to derive general principles of forest and fire ecology.

Increasingly, however, hypotheses are tested using large and complex simulation models.  (In fact, the two are not unrelated.  Hypotheses that are difficult to test in the real world, e.g., increased atmospheric CO2 will warm the planet, encourage the development of models that can reasonably represent those dynamics.)  The results from such model are not reproducible if the code used to generate the results is not publicly shared.  And this lack of reproducibility is completely unnecessary.  The open-source model of sharing knowledge lends reproducibility to such models.  And there are few remaining barriers to building (and maintaining) under the open-source paradigm.  As Ince et al. point out, the issues of intellectual property rights, access, publishing procedure, logistics, and packaging have largely been solved.

In our experience, an open-source approach (e.g., has been tremendously successful.  We upload code and documentation revisions throughout the day as we work on a particular problem.  There has been absolutely no financial cost.  The time and effort is minimal.  No one has ‘scooped’ us.  Meanwhile, the rewards have been tremendous. Our communication is highly efficient.  We have cultivated a large and vibrant community of scientists and developers. This community is constantly testing and improving our shared code.  In summary, the size and complexity of our environmental problems requires big data and complex models, and these require a community that matches the challenge in size, talent, and effort. Open-source opens those doors.

- Robert Scheller

Ince, D.C., L. Hatton, J. Graham-Cumming. 2012.  The case for open computer programs. Nature 482: 485-488.