Biggus is a lightweight pure-Python package which implements lazy operations on numpy array-like objects. This provides dramatically improved efficiency in analysing large datasets, for minimal additional effort in the client code.
See : https://github.com/SciTools/biggus
As scientific datasets continue to grow exponentially in size, the resource requirements of even simple analyses can quickly grow to become a problem -- e.g. the job takes an unreasonably long time, or simply runs out of space.
Existing solutions to this may be powerful, but can also come with a large complexity overhead, especially for the non-expert user. This makes adapting an existing analysis to the needs of larger datasets potentially very costly.
Biggus provides simple abstractions of data access and calculations which provide lazy evaluation. It exposes this as simple virtual array object which mimics a numpy array. Thus, it does not require the user to re-cast an operation in unfamiliar terms, or specify unrelated details of data storage or concurrency factors.
The lazy evaluation approach allows optimised resource usage for both storage accesses and the parallelisation of calculations. Pure Python is well suited to describing and implementing these techniques, and the resulting implementation is easily accessible to the average user.
At the Met Office, we develop data analysis tools for a large community of research scientists. We developed Biggus as a resource for the Iris project, our next-generation analysis library for meteorological and oceanographic data (see: http://scitools.org.uk/iris/). While Biggus is still work-in-progress, within Iris it is already delivering significant benefit, in tasks such as catalogueing large datasets and accelerating statistical calculations. Here, performance already exceeds that of other standard software toolsets.