NIXPy - Consistent Data Organization made easy: Versatile format for data and metadata

Christian Kellner

Audience level:
Novice

Description

Storing scientific data and metadata consistently and accessing it efficiently is an essential part of research and depends crucially on available. NIXPy is Python package that allows reading and writing of scientific data and metadata in the NIX format, a general, versitle and open data model and file format that is base on HDF5.

Abstract

Managing scientific data requires integration of information from multiple sources. Different types of recorded data may be combined with elaborated stimulation, requiring background information or metadata to interpret them correctly. Storing such information consistently is an essential part of experimental research and depends crucially on available file formats. Many existing formats are vendor or domain specific, or provide only limited support for storing metadata along with the data. Here we present the NIX format [1] and accompanying software to work with it from Python. The format itself is an open file format that was developed in the field of neuroscience but is versatile enough to represent various kinds of scientific data in conjunction with metadata, to facilitate data organization and data retrieval in the lab as well as data sharing. It enables storing recorded or derived data as well as all the meta-information about the experimental context, accounting for the relationships between data items. Data arrays are defined with units and dimension descriptors, so that the stored data can be readily interpreted as recorded quantities. The format further enables specifying the relationships between the data arrays and to describe points or regions of interest, such as areas in an image or events in a recorded signal. The NIX Python library, NIXPy [2], supports direct access to these targeted parts of the data and the linked metadata, and provides an easy to use API in a pythonic fashion. Reading and writing to data is similar to NumPy’s ndarrays and supports slicing for easy subset selection. NIX stores data and metadata using the HDF5 format [3]. Packages and installers for different platforms are provided, as well as detailed documentation, examples, and tutorials [4]. The codebase is mature and unit tested. Besides Python, there are also libraries for C++, Matlab, and Java. The NIX file format supports comprehensive annotation and efficient organization of scientific data, and the variety of libraries makes it easy to integrate access to data and metadata in the lab’s data collection and analysis workflow.

Acknowledgments Supported by the German Federal Ministry of Education and Research (Grant 01GQ1302)

References [1] https://github.com/G-Node/nix [2] https://github.com/G-Node/nixpy [3] http://hdfgroup.org/HDF5/ [4] http://g-node.github.io/nixpy

Sponsors