Saturday 10:30 a.m.–11 a.m.

ReciPy: Effortless provenance tracking for Python

Robin Wilson

Audience level:
Novice

Description

Have you ever run a Python script to produce some outputs and then forgotten exactly how you created them? For example, you created plot.png a few weeks ago and now you want to use it in a publication. By adding a single line of code to your script, ReciPy will log your inputs, outputs and code each time you run it. Come and find out how to use it, how it hooks into Python, and how you can help.

Abstract

Imagine the situation: You’ve written some wonderful Python code which produces a beautiful graph as an output. You save that graph, naturally enough, as graph.png. You run the code a couple of times, each time making minor modifications. You come back to it the next week/month/year. Do you know how you created that graph? What input data? What version of your code? If you’re anything like me then the answer will often, frustratingly, be “no”. Of course, you then waste lots of time trying to work out how you created it, or even give up and never use it in that journal paper that will win you a Nobel Prize…

This talk will introduce ReciPy, a Python module that will save you from this situation! (Although it can’t guarantee that your resulting paper will win a Nobel Prize). With the addition of a single line of code to the top of your Python files, ReciPy will log each run of your code to a database, keeping track of the input files, output files and the version of your code.

ReciPy was originally developed at the Collaborations Workshop 2015 Hack Day, run by the Software Sustainability Institute - a UK-based institute focused on improving the way that computational science is carried out. ReciPy won the Hack Day competition, and has been developed further to make it fully useable, even by novice programmers.

ReciPy is built of three separate components: a Python module that hooks into the Python import system so that it can ‘monkey patch’ input/output functions to write to a log before actually doing the input/output, a database stored in MongoDB and a range of interfaces to allow you to find out exactly how you did produce that graph.png file.

This talk will be suitable for programmers at all levels: you will hear how to install and use ReciPy and how it will help you (novice), how it hooks into Python (intermediate/advanced) and how you can help with further development (intermediate/advanced)

Sponsors