Learn to segment n-dimensional images with GALA

Juan Nunez-Iglesias

Fri 23 11:30 a.m.–11:50 a.m. in Dupreel

Abstract

One of the principal goals of HHMI's Janelia Farm Research Campus is the reconstruction of complete neuronal circuits. This involves 3D electron microscopy (EM) volumes many micrometres across with 10nm resolution, resulting in gigavoxel scale images. From these, neurons must be segmented out. Automatic image segmentation is a well-studied problem, but these data present unique challenges in addition to scale. First, neurons have an elongated, irregular branching structure, with processes up to 50nm thin but hundreds of micrometres long. This means small errors in segmentation can lead to large errors in the inferred neuronal structure and connectivity. Second, the internal texture of different neurons is very similar (to a computer vision algorithm, at least), and only a thin cellular boundary separates densely packed neurons. And third, some internal cellular structures within the neurons can look similar to the cellular boundary.

We follow a common computational paradigm for automated segmentation: 1. generate a pixel-level boundary probability map (for which we use Ilastik [1]). 2. generate superpixels from this map (for which we use watershed). 3. agglomerate the superpixels.

We developed an active learning algorithm for the last step, called GALA: Graph-based Active Learning of Agglomeration. [2] In GALA, we map segment pairs to a feature vector, and then train a classifier on pairs of segments. The current best-classifier then attempts an agglomeration on a training volume, while checking a ground truth, thereby actively growing its training set and improving its estimates. This is repeated several times, until the estimates stop improving.

This algorithm achieves state-of-the-art segmentation accuracy while being arbitrarily scalable. [2] I will present the GALA algorithm, followed by some of the software design aspects of the GALA library and command-line tool, which makes use of leading Python scientific libraries, including numpy, scipy, NetworkX, scikit-learn, scikit-image, and vigra. In particular, I will highlight the design of our flexible feature computation and caching module that makes it extremely easy to add new and efficient feature maps for the machine learning step.

The gala library is available on github [3].