dimarray: a package to manipulate numpy arrays with dimensions

Mahe Perrette

Abstract

dimarray is a package to handle numpy arrays with labelled dimensions and axes. Inspired from pandas, it includes advanced alignment and reshaping features and as well as nan handling.

The main difference with pandas is that it is generalized to N dimensions, and behaves more closely to a numpy array. The axes do not have fixed names ('index', 'columns', etc...) but are given a meaningful name by the user (e.g. 'time', 'items', 'lon' ...). This is especially useful for high dimensional problems such as sensitivity analyses. Axis names can be provided to the axis= parameter in numpy transformations:

a.mean(axis='time')

Axis names are also used to automatically align arrays during operations, such as via transpose, repeat (broadcast) and re-indexing. As a result, most operations are defined, even for situations where it would fail in numpy or pandas:

a + b

where a.dims == ('time',) and b.dims == ('item','time') will first re-index the time axis to make them match if necessary, and them repeat - or broadcast - the first array as many times are they are items in the second array, before performing the operation.

Indexing is in the same spirit as pandas, but

a['temperature', 1950]    # indexing on axis values

a.ix[3, -1]  # standard integer position index

Arrays and axes can have metadata, and can be written to / read from the convenient netCDF format (HDF5-based), widely used in geophysics. This also includes reading files from multiple experiments:

import dimarray as da
da.read_nc('experiment_*/output.nc', 'myvariable', axis='experiment')

will read 'myvariable' variable in each output.nc file in various experiment folders, and aggregate them along a new axis 'experiment'.

A number of other features are included, such as a few 'convenience' plotting functions (plot, contourf, pcolor, contour) and most compatible numpy methods. Some of the transformations's standard behaviour can be modified with axis attributes such as weights (e.g. to perform weighted mean, std or var). The code with more examples can be found on github. The project is already functional but no extensive optimization has been attempted so far.