Skip to content

Design Choices

This page explains the decisions we made, alternatives considered, trade-offs, and future work.

Problem / Goal

What outcome are we optimising for?

We want a reliable, lightweight way to generate brain vs non-brain masks for large EM volumes. These masks are used upstream (or downstream, however you see it) of other Catena modules (LSDs, synapses, mitochondria, neurotransmitters) to:

  • restrict heavy models to brain / VNC tissue only,
  • avoid wasting GPU/CPU on resin and artefacts,
  • reduce false positives (e.g. "neuron segmentations", "synapses" on resin),
  • standardise which voxels are “in play” across different pipelines.

The goal is not to solve fine semantic segmentation of all tissues, but to provide a clean, binary mask that says:

  • “this voxel belongs to brain / ventral nerve cord”, versus
  • “this is resin, trachea, or something we don’t plan to segment”.

We also want the solution to be:

  • easy to run on a single workstation,
  • minimally dependent on exotic libraries,
  • and robust enough to generalise to new EM volumes with limited re-tuning (less hyper-parameter optimisation).

Alternatives Considered

1. Global intensity thresholding

Idea: Use simple thresholds on intensity (or a histogram-based method like Otsu) to separate bright resin from darker tissue.

  • Pros

    • Extremely simple to implement.
    • No training data required.
    • Very fast.
  • Cons

    • Fails when tissue and resin have overlapping intensity ranges.
    • Sensitive to staining / imaging differences across datasets.
    • Blind to local texture; easily confuses trachea, artefacts, and brain.

We found this too brittle for practical use beyond tiny toy examples.

2. DoG-only / edge-based masking

Idea: Use a Difference of Gaussians (DoG) and gradient magnitude to highlight “structured” regions and suppress smooth resin; threshold the result.

  • Pros

    • Still classical CV; no labels needed.
    • Picks up fine structure better than raw intensity.
    • Works as a quick sanity-check baseline.
  • Cons

    • DoG alone produces noisy and fragmented masks.
    • Struggles in regions where resin has structure (e.g. artefacts, cracks).
    • Cannot explicitly distinguish trachea and other non-brain structures.

DoG on its own was not sufficient; it motivated the texture-based approach instead.

3. Texture / structure analysis (current CV baseline)

Idea: Combine DoG with local texture and structure analysis (e.g. variance, structure tensor) on small blocks to separate “biological-looking” texture from smoother background.

  • Pros

    • Much better than DoG alone for separating tissue from pure resin.
    • No training labels required.
    • Gives a quick, interpretable initial mask and confidence map.
  • Cons

    • Still purely local; no notion of semantics like “brain vs trachea”.
    • Requires some parameter tuning per dataset (block size, sigmas, region size).
    • Will include non-brain tissues that have similar texture to brain.

We keep this as a baseline and backup path, but not as the primary mask generator.

4. Heavier ML models (large U-Nets / transformers)

Idea: Use a deep 3D U-Net (or even transformers) with extensive augmentation, multi-scale context, voxel-size encoding, etc.

  • Pros

    • Potentially very strong performance.
    • Can model complex tissue variation and imaging artefacts.
    • Easier to extend to multi-class labelling (brain, trachea, resin, etc.).
  • Cons

    • More complex to train and tune.
    • Requires a lot of labeled data for training.
    • Higher compute and memory requirements.
    • Harder to document and maintain as a “simple module” for general users.

This felt overkill for a binary, upstream mask whose main role is to gate other modules.

5. No dedicated mask (let every module handle resin itself)

Idea: Skip this module; let each downstream model learn to ignore resin on its own.

  • Pros

    • Fewer components and no extra training step.
    • Simplifies the pipeline on paper.
  • Cons

    • Every module re-learns the same “ignore resin” behaviour (no segments or synapses are generated on resin parts).
    • Wasted computation on non-brain regions in every stage.
    • Harder to debug what each model is actually seeing.

Given Catena’s modular design, a shared mask is the more efficient and interpretable choice.

Decision

What we chose and why.

We adopted a two-path design:

  1. A classical CV path based on texture-based masking, which:
  2. is cheap and label-free,
  3. provides an initial “biological vs background” mask,
  4. outputs a confidence map that can be combined with learned models.

  5. A simple 3D U-Net (MONAI) trained for brain vs non-brain semantic segmentation on downsampled volumes, which:

  6. uses minimal but targeted augmentation to generalise across EM volumes,
  7. is trained in pixel space (no voxel-size encoding) to keep things simple,
  8. produces masks that transfer reasonably well to unseen EM data.

The classical path is treated as a baseline and diagnostic tool; the U-Net path is the recommended default for production masks.

Trade-offs

  • Accuracy vs complexity
  • The 3D U-Net is intentionally small and simple. We trade a bit of peak accuracy for ease of training and reproducibility.
  • The texture-based method is less accurate but more explainable and robust when labels are scarce.

  • Data / label requirements

  • Texture-based masking needs no labels.
  • The U-Net does require semantic labels for at least one volume, but only at a coarse brain vs non-brain level. We use a downsampled version of the volume to speed up training (usually at 50um scale 5 isotropic from FIBSEM).

  • Generalisation

  • Texture-based masking is sensitive to resolution and noise, but not to annotation style.
  • The U-Net generalises surprisingly well to unseen volumes but will eventually drift if the imaging domain is very different (different species, staining, etc.).

  • Performance / compute

  • Texture-based masking is CPU-friendly and works fine on modest hardware. Masks often generated on Mac M1 Pro with 16GB RAM in a few minutes.
  • The 3D U-Net benefits from a GPU but is still modest in size; it can run on a single workstation without heroic resources (and even on CPU-only machines like a Mac M1 Pro). Training typically takes a few hours on a single GPU.

Implementation Notes

The module currently has two main components:

  1. Texture-based EM masking (classical CV)
  2. 3D U-Net brain vs non-brain model (MONAI)

1. Texture-based EM masking

Core function:

mask, confidence = em_texture_masking(
    volume,
    block_size=64,
    texture_sigma=1.0,
    edge_sigma=0.5,
    min_region_size=250
)

Inputs

volume

3D EM volume (or 2D slice stack) in which we want to separate “brain-like texture” from background resin / junk.

block_size

Size of the local window (in pixels/voxels) used for computing texture statistics.

  • Smaller blocks → more detailed, sensitive to small variations but also to noise.
  • Larger blocks → smoother, more robust, but can miss tiny islands of tissue.

In practice, 64 is a good starting point for downsampled FIBSEM; scale it roughly with your in-plane resolution.

texture_sigma

Standard deviation of the Gaussian used in the texture filter (e.g. for smoothing features such as local variance or structure tensor).

  • Lower values → focus on very fine texture details (good for high-res, noisy EM).
  • Higher values → capture coarser patterns, ignore tiny fluctuations.

If the mask looks very noisy, try increasing this slightly.

edge_sigma

Standard deviation of the Gaussian used in the edge / gradient filter. This controls the spatial scale of edges that contribute to the mask.

  • Small values (e.g. 0.5) → respond to fine edges and thin structures.
  • Larger values → emphasise broader transitions, ignore tiny speckles.

If you’re missing thin tissue at boundaries, decrease this a bit; if you’re getting a lot of edge noise, increase it.

min_region_size

Minimum size (in pixels/voxels) of a connected component to keep in the final mask. Everything smaller is dropped as noise.

  • Increase this if you see lots of tiny specks of “tissue” in resin.
  • Decrease it if you actually care about very small islands of brain tissue.

This should be tuned relative to your image size and downsampling; 250 is a reasonable starting point for typical subvolumes.

Outputs

mask

Binary mask (same shape as volume): 1 for “texture consistent with tissue we care about”, 0 for background / non-brain.

confidence

Float array with a continuous score per pixel/voxel (e.g. how strongly the local texture looks like “brain”).

Useful if you want to:

  • tweak thresholds later without recomputing everything,
  • visualise “soft” tissue likelihood,
  • combine this with other signals (e.g. U-Net logits) before making a hard mask.

3D U-Net (MONAI) for brain vs non-brain

Key design points

The ML path uses a 3D U-Net implemented in MONAI to predict a single-channel foreground mask:

  • 1 ≈ brain / VNC tissue
  • 0 ≈ non-brain (resin, trachea, artefacts, etc.)

Everything is trained in voxel / pixel space (we do not encode voxel size explicitly).


Architecture

The model is a fairly standard, compact 3D U-Net:

from monai.networks.nets import UNet

model = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=1,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
)
  • spatial_dims=3 → full 3D convolutions.

  • Input: single-channel EM volume (grayscale).

  • Output: single-channel logit volume (brain foreground probability after sigmoid).

Depth and channels are deliberately modest so the model can train and infer on a single GPU / workstation.

Training

Data and transforms:

  • EM volumes and masks are loaded from TIFF/Zarr.

  • A custom Dataset yields patches with keys like 'image' and 'mask'.

  • MONAI / TorchIO transforms handle:

    • channel handling and intensity scaling,

    • simple spatial augmentation (flips, 90° rotations, zoom),

    • optional Gaussian noise.

  • Data is explicitly split into train, validation, and test sets.

Loss: DiceCELoss with sigmoid=True, suitable for binary foreground prediction from a single logit channel.

Optimiser: Adam with a conservative learning rate 1e-4.

Inference

Patch-based test inference

test_model(model, test_loader, save_dir="./outputs/runX", device=device)
  • Loops over a test_loader of patches.

  • Saves predicted patches and corresponding ground-truth masks (TIFF).

  • Reports average Dice and Jaccard across the test set.

Whole-volume sliding-window inference

run_inference_whole_volume(
    model,
    data_path,
    output_path,
    patch_size=(64, 64, 64),
    overlap=16,
    device="cuda:0",
)
  • Loads the full 3D EM volume (TIFF or Zarr).

  • Applies simple MONAI intensity preprocessing (scaling / normalisation).

  • Tiles the volume into overlapping 3D patches (patch_size), runs the model on each, and stitches them back together with overlap-handling.

  • Writes the predicted mask volume back to disk as TIFF or Zarr.

A variant of this is also provided with a smaller patch size (e.g. (32, 32, 32) and larger overlap) to trade off memory vs smoothness.

Operational Guidance

Tuning texture-based masking

  • Start with the default parameters:

    • block_size = 64

    • texture_sigma = 1.0

    • edge_sigma = 0.5

    • min_region_size = 250

  • If the mask is too noisy:

    • increase texture_sigma

    • increase min_region_size

  • If you are missing thin tissue:

    • decrease edge_sigma

    • slightly decrease min_region_size

Always inspect a few slices of volume, mask, and confidence overlays to make sure the behaviour makes sense.

Using the ML model

  • Use the texture-based mask as a quick check: if it completely fails, you likely have a very different domain and should re-examine your input.

  • For new datasets:

    • Run the trained U-Net on a subset.

    • Post-process the predictions.

  • Overlay EM + mask and visually check:

    • tissue boundaries,

    • trachea / artefacts,

    • any systematic failure modes (e.g. over-masking near edges).

  • If performance degrades significantly:

    • annotate a small ROI in the new dataset,

    • fine-tune the existing model rather than training from scratch.

Future Work / Open Questions

Multi-class tissue labelling Extend beyond binary brain vs non-brain to explicitly label trachea, neuropil vs soma, etc., for more fine-grained control of downstream modules.

Domain adaptation Explore semi-supervised or unsupervised domain adaptation to handle new EM modalities with minimal additional labels.