Design Choices¶

This page explains the decisions we made, alternatives considered, trade-offs, and future work.

Problem / Goal¶

We want to robustly segment neurons in large Electron Microscopy (EM) volumes, and do so in a way that:

scales from “toy” ROIs (small subvolumes) to multi-terabyte datasets,
works across different microscopes, preparations, and organisms,
stays reproducible and re-runnable on normal lab infrastructure (HPC / workstations),
plugs cleanly into downstream connectomics steps (synapses, mitochondria, neurotransmitters, proofreading).

In practical terms: given raw EM, we want a bottom-up pipeline that predicts local fields (affinities + LSDs) and then turns those into instances via watershed and agglomeration.

Alternatives Considered¶

Affinities-only U-Net
Pros: simpler targets, lighter bookkeeping, fewer outputs. We support this within our current neuron segmentation setup. You can forego generating LSDs and train an affinity-only model.
Cons: less shape-awareness, more brittle in thin processes, more sensitive to hyperparameters during agglomeration. LSDs often show consistently better or more stable behaviour for the same backbone.
Boundary-only / membrane U-Nets (torchEM / BiaPy-style)
Pros: conceptually straightforward (predict a membrane probability map), widely used, and well-supported in libraries like torchEM and BiaPy. Easy to plug into standard watershed + agglomeration toolchains from scikit-image (even be inspired from the PlantSeg) and to reuse existing code/configs.
Cons: all the instance information is pushed into a single boundary channel (could be expanded to use Contours and Distance). This tends to make training and post-processing more sensitive to class imbalance, noise, and local artefacts; and you often need dataset-specific tuning of thresholds and post-processing to get robust neuron instances. LSDs explicitly inject local shape statistics, which we found to stabilise training and make the affinities less brittle.
Flood-Filling Networks (FFNs)
Pros: strong instance segmentation story, conceptually close to how humans might trace neurons.
Cons: heavier runtime, more complex code-paths, harder to integrate with our Zarr/Daisy/Gunpowder stack, and less friendly to small teams wanting to run and extend the pipeline.
Different storage formats (HDF5-only, N5, TIFF)
Pros: sometimes simpler to inspect (TIFF), widely used (HDF5), compatible with some existing tools. We do use N5 as Zarr allows loading N5s and we also use HDF5s, but predominantly for synapse detection.
Cons: TIFFs can be less ergonomic for distributed processing at scale; more friction with our chosen ecosystem; harder to maintain a single, consistent story across modules.

Decision¶

We standardised on the following:

Model family: multi-task U-Nets that jointly predict LSDs and affinities (and optionally other channels such as mitochondria).
I/O format: Zarr, with standardised dataset names under volumes/raw for EM; volumes/labels/neuron_ids for neuron labels. Note: We follow the CREMI format for laying out our data.
Scaling: Gunpowder for data loading/augmentation + Daisy for block-wise prediction and large-volume instance segmentation.
Downstream segmentation: watershed + agglomeration on affinities, using LSDs as an auxiliary signal during training to make those affinities “better behaved”.

This gives us:

a reasonably small and understandable codebase,
a single storage + processing story that generalises to other Catena modules,
and a path that remains open, reproducible, and modifiable by other labs.

Why Gunpowder?¶

We build the training and augmentation pipeline on top of Gunpowder.

One key reason is that Gunpowder represents every array with an explicit voxel size and ROI. All crops, fields-of-view, and augmentations are defined in this voxel-resolution space and then applied relative to the data’s native resolution. In practice, this means the same pipeline configuration naturally adapts across datasets with different voxel sizes: a “certain amount of context” really means a certain amount of physical context, not just an arbitrary number of pixels.

This makes the training setup more robust and resolution-aware than pixel-based scripts.

Trade-offs¶

More targets to generate
LSDs require additional pre-processing to create per-voxel shape descriptors. Though this happens on-the-fly as part of the data augmentation pipeline, it adds a bit of complexity and hence the training is overall slower than you would see in a affinity-only model.
Slightly larger models / outputs
Predicting both LSDs and affinities means more output channels and slightly higher memory usage than a pure-affinity U-Net.
Zarr-first means conversion upfront
If you start from HDF5 or raw TIFFs, there is an upfront conversion step. For tiny test volumes this can feel like overkill, but it pays off once you move beyond toy/ROI scales.
Watershed + agglomeration stack
Instance segmentation is a multi-step process (prediction → fragments → agglomeration → LUT → final segmentation), and some steps rely on MongoDB and batch jobs. This is more moving parts than a toy example using small subvolumes, but it’s also what lets us scale to big volumes and explore different thresholds without re-running everything.

Implementation Notes¶

Some relevant components in this module:

Data preparation
create_dir_organisation.py: builds the directory structure for 2D/3D, train/test, and domains.
hdf_to_zarr.py: converts HDF5 datasets into Zarr and adds required label/mask datasets. We always use zarr during model training and inference.
Config modification and management
config_{datavolume}.py: adapt and modify a config file. These shared config files here are examples that we ourselves have used for training our models. Each config is named based on the dataset it used during training. Input shapes, downsampling factors and kernels are guided by the EM resolution. In lay terms, this means that anisotropic models (on anisotropic datasets) are trained differently to how isotropic models. Generally, the downsampling is adjusted such that the data eventually becomes isotropic as it passes through the model's convolutional layers till the bottleneck layer. Augmentations are also applied anisotropically for anisotropic datasets.
Models
MtlsdModel, MtlsdMitoModel, LsdModel, AffModel under local_shape_descriptors/models: PyTorch implementations of multi-task U-Nets for LSDs, affinities, and (optionally) mitochondria. We strongly recommend using MTLSD model
Training / prediction
trainer.py: multi-task LSD + affinity training using Gunpowder.
predicter.py: single-process prediction on smaller volumes.
super_predicter_daisy.py: Daisy-driven, block-wise prediction for large volumes.
Instance segmentation
instance_segmenter.py: watershed + agglomeration for small-ish volumes.
02_extract_fragments_blockwise.py, 03_agglomerate_blockwise.py, 04_find_segments_full.py, 05_extract_segmentation_from_lut.py: Daisy + MongoDB-based large-volume fragment extraction, agglomeration, LUT generation, and final segmentation writing.

Outputs are also written into Zarrs.

Operational Guidance¶

Use Zarr from the beginning if you know the dataset will grow beyond small subvolumes (ROIs).
For quick experiments / small ROIs, predicter.py + instance_segmenter.py are the path of least resistance.
For large volumes, always:
Convert to Zarr,
Use super_predicter_daisy_{chunkskipping}.py for affinities (and LSDs),
Run the block-wise watershed + agglomeration scripts.
Start with the provided configs for public datasets (CREMI, SNEMI, etc.) and only deviate once you have a baseline.

Future Work / Open Questions¶

Better, more automatic hyperparameter selection for agglomeration thresholds.
Tighter integration with other Catena modules (e.g. reusing LSDs as features for other segmentation tasks). We can currently couple mitochondria segmentation and neuron segmentation in the same LSDs framework.
Exploring more top-down approaches on top of LSDs (e.g. graph-based refinement, morphology-aware merges/splits).
Support for alternative storage backends (CloudVolume / NGFF) while keeping (OME)-Zarr as the main path.