Galry: high performance interactive data visualization in Python

Cyrille Rossant

Abstract

Cyrille Rossant, Kenneth D. Harris UCL Institute of Neurology, UCL Department of Neuroscience, Physiology, and Pharmacology 21 University Street London WC1E 6DE UK

The amount of data obtained in science is growing exponentially. Advances in computing have allowed automatic analyses to scale to very large data sets, but human involvement is still a critical component of the scientific process. Interactive data visualization is an effective way for an experimenter to gain an intuitive understanding of large-scale datasets, uncover experimental artefacts and unexpected patterns. Hence, there is a critical need for efficient visualization tools that can handle large, high-dimensional data sets.

Python is increasingly used as an open source platform for scientific computing and visualization. Although scientific plotting features are available, its interactive visualization capabilities have not been designed to scale to very large data sets. Existing libraries such as Matplotlib are generally unable to smoothly display more than one million points, whereas 3D libraries such as MayaVi are typically unadapted to 2D plotting.

We developed a new interactive visualization library designed specifically to handle very large data sets. The library, named Galry, is open source software, written in pure Python on top of standard external libraries. Galry is based on OpenGL, and exploits the hardware acceleration of the graphical processing unit (GPU) to smoothly display up to one hundred million points on a standard desktop computer. The library supports common 2D and 3D plots, including graphs, images, polygon meshes and 2D surfaces, and can be extended for entirely customized visualizations.

High performance is achieved by storing data on the GPU and avoiding unnecessary transfers between CPU and GPU memory. Extensive use of vertex and fragment shaders enable smooth interactive navigation through parallel computations on the GPU. In addition, the cost of Python interpretation is reduced thanks to fast vectorization techniques implemented in NumPy and PyOpenGL. These techniques result in loading times up to 20 times faster and frame rates up to 500 times higher than Matplotlib on plots containing ten million points.

This library was developed for the particular application of spike sorting for large-scale extracellular recordings, a common pre-processing technique for isolating single neuron spiking activity from extracellular multielectrode recordings. This software, written in Python and Qt, is an ergonomic interface for the manual stage of spike sorting. It allows to display smoothly a large number of spikes across tens of channels. Besides, we report an other application of Galry to visualization of long intracellular and extracellular recordings.

Although originally developed for neuroscience applications, Galry will be useful more generally in any scientific or engineering area where there is an increasing need for lightweight and fast visualization of big data.