Design Choices¶
This page explains the decisions we made for mitochondria segmentation in Catena, alternatives considered, trade-offs, and where we'd like to go next.
Problem / Goal¶
We want a mitochondria segmentation pipeline that:
- works reliably on FIBSEM EM volumes (at least to start with, since our in-house datasets are currently all FIBSEM volumes),
- fits naturally into Catena's broader connectomics workflow,
- can provide both semantic masks (mitochondria vs non-mito) and instance labels when needed,
- and is practical to train and run on typical lab infrastructure (GPU workstation / HPC node).
Conceptually, we follow a two-stage setup:
- Semantic segmentation - predict a voxel-wise mitochondria mask.
- Instance segmentation - convert that mask into uniquely labelled mitochondria (connected components).
We also want to reuse public tools and models wherever possible rather than reinventing everything. For ground-truth, we therefore lean on Empanada / MitoNet and then train our own models in a way that is easy to adapt and extend.
Data curation¶
For training data, we use Empanada to generate mitochondria labels:
- Empanada provides MitoNet, a panoptic segmentation model for mitochondria.
- We fine-tune MitoNet on our FIBSEM data and use its predictions as a strong starting point for ground-truth curation.
This avoids fully manual voxel-wise annotation while still giving us high-quality masks to train on.
Data are stored in Zarr stores with a consistent layout:
- raw EM under
volumes/raw - labels under:
volumes/labels/neuron_ids(when treating neurons as labels), orvolumes/labels/mito_ids(for mitochondria).
The label_type flag ('neuron' / 'mito') lets the same code path handle either case.
Alternatives Considered¶
1. Use Empanada / MitoNet directly at inference¶
Option: rely entirely on fine-tuned MitoNet via Empanada for both training and inference.
-
Pros
- Reuses a well-tested panoptic segmentation model.
- Minimal extra modelling work in Catena.
-
Cons
- Ties inference tightly to the Empanada stack. Also, Empanada only exposed MitoNet within
napari, which is not suitable run segmentations on large datasets. - Less flexibility to experiment with architectures or training schemes.
- Harder to integrate with Catena's generalised Zarr-based, patch-wise training/inference pattern.
- Ties inference tightly to the Empanada stack. Also, Empanada only exposed MitoNet within
We still use Empanada/MitoNet for ground-truth generation, but not as Catena's main inference engine.
2. Single "best" model vs multiple architectures¶
Option: pick one model (e.g. a single 3D U-Net) and optimise only that.
-
Pros
- Simpler codebase.
- Less configuration branching.
-
Cons
- Harder to compare architectures on the same data and pipeline.
- Less flexibility when data or requirements change.
Instead, we support two architectures side by side:
- MONAI 3D U-Net baseline (
model_type = "monai_unet"). - Residual U-Net (RS-UNet) adapted from Xie et al. (
model_type = "rs_unet").
Both share the same training and inference scaffolding (Zarr IO, patch-based training, sliding-window prediction), but differ in backbone details.
3. Semantic + connected components vs full panoptic model¶
Option: train a full panoptic instance segmentation model (e.g. directly at MitoNet's level of complexity).
-
Pros
- Instances are explicit; no need for post-hoc connected components.
- Closer to Empanada's internal model.
-
Cons
- More complex models and training pipelines.
- Heavier to run and tune for new datasets.
- Harder to integrate with generic segmentation tools in Catena.
We instead adopt a semantic-first approach (because sometimes only a semantic mask is the requirement) and do instance segmentation via connected components:
- semantic prediction is produced by either MONAI U-Net or RS-UNet,
- instance labels are generated by a small, explicit post-processing step.
This keeps the pipeline simpler, more transparent, and easier to adapt.
Decision¶
We settled on the following design:
- Ground-truth generation:
- Use Empanada / MitoNet to obtain strong panoptic predictions.
- Curate these predictions into training labels for mitochondria.
-
For public datasets, we use Seg2Link-3D for ground-truth curation (see proofreading section).
-
Two semantic segmentation backbones:
- MONAI 3D U-Net as a straightforward baseline.
-
Residual U-Net (RS-UNet) adapted from Xie et al., with residual blocks in the encoder-decoder backbone to improve feature reuse and gradient flow while keeping the same semantic-to-instance pipeline.
-
Zarr-based patch-wise training and inference:
- Patch size and stride controlled via config (
patch_size,stride). -
All EM data under
volumes/raw; labels undervolumes/labels/*. -
Instance segmentation as a separate, simple step:
- Use connected components on binarised semantic masks, with configurable thresholds and size filters.
This gives us a flexible, modular setup that fits with how Catena handles other modalities (neurons, synapses, EM masks). We can adapt this framework to explore affinity-based methods for mitochondria segmentation (we actually can use LSDs for joint neuron + mitochondria) predictions and also explore contemporary graph-cut methods for better instance segmentation results.
Trade-offs¶
-
Model complexity vs ease of use
- MONAI U-Net is simple and easy to understand (super-easy to use); RS-UNet is more expressive but slightly more complex.
- Supporting both adds a bit of code complexity but gives users flexibility.
-
Panoptic vs semantic -> instances
- Using Empanada only for GT and not for inference means we duplicate some functionality, but we gain a unified Catena-style training/inference story.
- Doing instances via connected components is naive compared to full panoptic models, but it is explicit, tunable, and easy to inspect.
-
Patch-wise processing vs full-volume
- Patch-based training/inference is necessary for memory reasons, but it introduces tiling and overlap hyperparameters that need to be set carefully.
-
Format conventions
- Insisting on
volumes/rawandvolumes/labels/...plus Zarr structure may require users to convert/import their data, but it standardises everything downstream.
- Insisting on
Implementation Notes¶
Training configuration¶
Training is driven by an Args class (see monai_unet_train.py / rsunet_train.py), with key fields such as:
- Experiment & data:
exp_name: experiment identifier (used for checkpoints, logs).train_zarr_dirs,test_zarr_dirs: one or more Zarr directories for train/test.label_type:'neuron'or'mito', which determines which label dataset to read (volumes/labels/neuron_idsvsvolumes/labels/mito_ids).
Note
Akin to Xie et al., we have this to verify if a model trained on neuron segmentations, yields better results when fine-tuned on mitochondria segmentation, even when the mitochondrial datasets are limited.
-
Training schedule:
epochs: max training epochs.batch_size: patch batch size (often1for 3D patches when running on a 12GB GPU).eval_interval: validation frequency.ckpt_interval: checkpoint save interval.
-
Patch & resolution:
patch_size: e.g.[128, 128, 128].stride: e.g.[64, 64, 64](controls overlap).original_res,target_res: physical resolution metadata (often equal for now).
-
Preprocessing & sampling:
clahe: whether to apply CLAHE contrast normalisation.subsample_frac,subsample_number,subsample_seed: control patch subsampling.balance_patches: ifTrue, enforce a balance between positive/negative patches.min_positive_pixels: minimum number of positive pixels per patch to consider it "positive".
-
Optimisation & loss:
learning_rate,learning_rate_after_hotstart_50.loss_type:'DiceLoss'or'DiceCE'(Dice + cross-entropy).loss_weights: weight factor to compensate for class imbalance.
-
Augmentation & model init:
rotation_augs,contrast_augs.model_loc/resume_checkpoint: path to a checkpoint (for fine-tuning) orNonefor training from scratch. This will be updated and cleaned up in newer releases of code.freeze_encoder,hotstart: options to partially freeze or warm-start the model.
Both MONAI U-Net and RS-UNet use the same high-level config; the difference is which script you call and what model_type you specify.
Inference configuration¶
Inference uses an InferenceArgs class (e.g. in predict.py and rsunet_predict.py), with:
model_path: path to the.pthcheckpoint.test_zarr_dirs: list of Zarr directories to run inference on.label_type: must match training.patch_size,stride,original_res,target_res,clahe: must be consistent with training.batch_size,num_workers: runtime performance controls.output_dir,output_filename,output_format("tiff"or"zarr").model_type:"rs_unet"or"monai_unet"to select the architecture.
The inference script writes out a semantic prediction volume at the target resolution. If all you need is semantic mitochondria labels, you can stop here.
Instance segmentation¶
Instance labels are generated by instance_segmenter.py, driven by a ConversionArgs class:
input_prediction_path: path to the saved semantic prediction file (TIFF or Zarr).output_instance_dir,output_format: where and how to save instances.chunk_size,overlap: control chunked processing for large volumes.thres_foreground: probability threshold for binarising semantic predictions (0-1).thres_small_instances: minimum size for keeping an instance.scale_factors: optional scaling (usually(1, 1, 1)).remove_small_mode: how to treat small objects (e.g.'background').
The script:
- Loads the semantic prediction volume.
- Binarises it using
thres_foreground. - Runs connected components per chunk (with overlap stitching).
- Removes small instances below
thres_small_instances. - Writes an instance-labelled volume.
Operational Guidance¶
Choosing a backbone¶
- Start with the MONAI U-Net if you want a straightforward baseline and easier debugging.
- Try RS-UNet if you:
- already have a working baseline, and
- want to see whether residual blocks improve IoU / Dice for your dataset.
Tuning training¶
- If mitochondria are rare in the volume, ensure:
balance_patches = True,min_positive_pixelsis set to something sensible (inspect a few visualisations).- If training is unstable or overfitting:
- reduce
learning_rate, - consider lowering
loss_weights, - start with fewer augmentations and add them back gradually.
Tuning instance segmentation¶
- If you see many tiny, spurious instances:
- increase
thres_small_instances. - If you lose small mitochondria you care about:
- decrease
thres_small_instances, - and possibly lower
thres_foregroundslightly (but watch for noise). - For large volumes, adjust
chunk_sizeandoverlapso chunks fit into memory but still allow smooth stitching.
Always visually inspect:
- raw EM,
- semantic prediction,
- instance labels
for a few representative subvolumes before trusting metrics alone.
Future Work / Open Questions¶
-
Better instance segmentation
- Explore more advanced instance-labelling methods (beyond plain connected components) while keeping the semantic-first philosophy.
-
Tighter integration with Empanada
- Streamline workflows for transferring models and labels between Empanada and Catena.
-
Cross-dataset generalisation
- Systematically test how well models trained on one FIBSEM dataset transfer to others, and what minimal fine-tuning is required.
-
Joint modelling with other modalities
- Use shared backbones or multi-task setups (e.g. neurons + mitochondria) to exploit shared structure in EM data, while keeping the current pipelines as a simple, reliable baseline.
The current design aims to be practical, transparent, and composable good enough to use today, and flexible enough to evolve as we gather more data and experience.