EuroSciPy logo


Cambridge, UK - 27-30 August 2014

HyperSpy, a Python package for multidimensional data analysis

Francisco de la Peña

Fri 29 11:30 a.m.–11:50 a.m. in William Gates Building


In many cases the dimensions of multidimensional datasets can be grouped in two categories: signal and navigation dimensions. For example, the data contained in a three-dimensional Numpy array could be viewed as an image stack i.e. a multidimensional dataset with signal dimension 2 and navigation dimension 1. Alternatively, the same dataset could be viewed as a spectrum image i.e. as a multidimensional dataset of signal dimension 1 and navigation dimension 2. Interestingly, once the dimensions have been grouped into these two categories, many operations that were previously ill-defined become apparent. For example, matrix decomposition—an operation that only accepts two-dimensional data as input—could then be applied to multidimensional datasets by unfolding the data and signal dimensions separately to obtain a matrix.

HyperSpy is a Python package that aims at making it easy and natural to apply analytical procedures that operate on an individual signal to multidimensional datasets, as well as providing easy access to analytical tools that exploit the multidimensionality of the dataset. It does so by classifying the Numpy array dimensions into the signal and data categories and adding scaled and named axes. It provides, amongst others, tools for interactive data visualization, multidimensional curve fitting and machine learning as well as iterators and an extension of Numpy’s fancy indexing. HyperSpy builds upon the functionality provided by Numpy, Scipy, Maplotlib and scikit-learn and integrates nicely into the scientific Python ecosystem. For example, one could use HyperSpy to map any function from scikit-image to multidimensional image data using the same syntax regardless of the navigation dimensions.