A solution for Scientific Computations Reproducibility

Jérôme Roy

Abstract

Computation reproducibility [1] is an increasingly sought-after goal in modern science, due to several converging factors: on the one hand, the Open Science movement considers sharing computing code a necessary step for the improvement of scientific research; on the other hand, journal publishers insist more and more on accessing raw data and code in order to allow the scientific community to scrutinize published results in a reliable way. The increasing size of scientific teams also makes it desirable to share experimental data, code and results in a centralized way.

To this end, and especially for long-term work, reproducibility cannot be achieved without full control of the hardware and software used to obtain these results. A natural way of ensuring this control is to use a versioning system for the scientists' own code and virtualization for operating systems and programming language libraries.

We present here a framework that solves these issues in one place and helps scientists organize a shared computing environment. Simulagora [2] uses virtual machines to launch computations and run related post-processing steps. These virtual machines can be located in a local or a distant cloud, and rely on a controlled software environment (e.g. with specific versions of compilers and scientific libraries). Simulagora also manages software repositories (versioned with Mercurial) and result files. The full history of launched computations allows the user to easily access past results with all information regarding their setup environment. We will also show how a collaboration setup can be used to share codes (allowing others to launch computations with the same environment) as well as the results themselves.

[1] Computational Reproducibility: State-of-the-Art, Challenges, and Database Research Opportunities [2] Simulagora