Design Choices¶

This page explains the decisions we made, alternatives considered, trade-offs, and future work.

Problem / Goal¶

What outcome are we optimising for?

We want a reliable, lightweight way to generate brain vs non-brain masks for large EM volumes. These masks are used upstream (or downstream, however you see it) of other Catena modules (LSDs, synapses, mitochondria, neurotransmitters) to:

restrict heavy models to brain / VNC tissue only,
avoid wasting GPU/CPU on resin and artefacts,
reduce false positives (e.g. "neuron segmentations", "synapses" on resin),
standardise which voxels are “in play” across different pipelines.

The goal is not to solve fine semantic segmentation of all tissues, but to provide a clean, binary mask that says:

“this voxel belongs to brain / ventral nerve cord”, versus
“this is resin, trachea, or something we don’t plan to segment”.

We also want the solution to be:

easy to run on a single workstation,
minimally dependent on exotic libraries,
and robust enough to generalise to new EM volumes with limited re-tuning (less hyper-parameter optimisation).

Alternatives Considered¶

1. Global intensity thresholding¶

Idea: Use simple thresholds on intensity (or a histogram-based method like Otsu) to separate bright resin from darker tissue.

Pros
- Extremely simple to implement.
- No training data required.
- Very fast.
Cons
- Fails when tissue and resin have overlapping intensity ranges.
- Sensitive to staining / imaging differences across datasets.
- Blind to local texture; easily confuses trachea, artefacts, and brain.

We found this too brittle for practical use beyond tiny toy examples.

2. DoG-only / edge-based masking¶

Idea: Use a Difference of Gaussians (DoG) and gradient magnitude to highlight “structured” regions and suppress smooth resin; threshold the result.

Pros
- Still classical CV; no labels needed.
- Picks up fine structure better than raw intensity.
- Works as a quick sanity-check baseline.
Cons
- DoG alone produces noisy and fragmented masks.
- Struggles in regions where resin has structure (e.g. artefacts, cracks).
- Cannot explicitly distinguish trachea and other non-brain structures.

DoG on its own was not sufficient; it motivated the texture-based approach instead.

3. Texture / structure analysis (current CV baseline)¶

Idea: Combine DoG with local texture and structure analysis (e.g. variance, structure tensor) on small blocks to separate “biological-looking” texture from smoother background.

Pros
- Much better than DoG alone for separating tissue from pure resin.
- No training labels required.
- Gives a quick, interpretable initial mask and confidence map.
Cons
- Still purely local; no notion of semantics like “brain vs trachea”.
- Requires some parameter tuning per dataset (block size, sigmas, region size).
- Will include non-brain tissues that have similar texture to brain.

We keep this as a baseline and backup path, but not as the primary mask generator.

4. Heavier ML models (large U-Nets / transformers)¶

Idea: Use a deep 3D U-Net (or even transformers) with extensive augmentation, multi-scale context, voxel-size encoding, etc.

Pros
- Potentially very strong performance.
- Can model complex tissue variation and imaging artefacts.
- Easier to extend to multi-class labelling (brain, trachea, resin, etc.).
Cons
- More complex to train and tune.
- Requires a lot of labeled data for training.
- Higher compute and memory requirements.
- Harder to document and maintain as a “simple module” for general users.

This felt overkill for a binary, upstream mask whose main role is to gate other modules.

5. No dedicated mask (let every module handle resin itself)¶

Idea: Skip this module; let each downstream model learn to ignore resin on its own.

Pros
- Fewer components and no extra training step.
- Simplifies the pipeline on paper.
Cons
- Every module re-learns the same “ignore resin” behaviour (no segments or synapses are generated on resin parts).
- Wasted computation on non-brain regions in every stage.
- Harder to debug what each model is actually seeing.

Given Catena’s modular design, a shared mask is the more efficient and interpretable choice.

Decision¶

What we chose and why.

We adopted a two-path design:

A classical CV path based on texture-based masking, which:
is cheap and label-free,
provides an initial “biological vs background” mask,
outputs a confidence map that can be combined with learned models.
A simple 3D U-Net (MONAI) trained for brain vs non-brain semantic segmentation on downsampled volumes, which:
uses minimal but targeted augmentation to generalise across EM volumes,
is trained in pixel space (no voxel-size encoding) to keep things simple,
produces masks that transfer reasonably well to unseen EM data.

The classical path is treated as a baseline and diagnostic tool; the U-Net path is the recommended default for production masks.

Trade-offs¶

Accuracy vs complexity
The 3D U-Net is intentionally small and simple. We trade a bit of peak accuracy for ease of training and reproducibility.
The texture-based method is less accurate but more explainable and robust when labels are scarce.
Data / label requirements
Texture-based masking needs no labels.
The U-Net does require semantic labels for at least one volume, but only at a coarse brain vs non-brain level. We use a downsampled version of the volume to speed up training (usually at 50um scale 5 isotropic from FIBSEM).
Generalisation
Texture-based masking is sensitive to resolution and noise, but not to annotation style.
The U-Net generalises surprisingly well to unseen volumes but will eventually drift if the imaging domain is very different (different species, staining, etc.).
Performance / compute
Texture-based masking is CPU-friendly and works fine on modest hardware. Masks often generated on Mac M1 Pro with 16GB RAM in a few minutes.
The 3D U-Net benefits from a GPU but is still modest in size; it can run on a single workstation without heroic resources (and even on CPU-only machines like a Mac M1 Pro). Training typically takes a few hours on a single GPU.

Implementation Notes¶

The module currently has two main components:

Texture-based EM masking (classical CV)
3D U-Net brain vs non-brain model (MONAI)

1. Texture-based EM masking¶

Core function:

mask, confidence = em_texture_masking(
    volume,
    block_size=64,
    texture_sigma=1.0,
    edge_sigma=0.5,
    min_region_size=250
)

Inputs¶

`volume`¶

3D EM volume (or 2D slice stack) in which we want to separate “brain-like texture” from background resin / junk.

`block_size`¶

Size of the local window (in pixels/voxels) used for computing texture statistics.

Smaller blocks → more detailed, sensitive to small variations but also to noise.
Larger blocks → smoother, more robust, but can miss tiny islands of tissue.

In practice, 64 is a good starting point for downsampled FIBSEM; scale it roughly with your in-plane resolution.

`texture_sigma`¶

Standard deviation of the Gaussian used in the texture filter (e.g. for smoothing features such as local variance or structure tensor).

Lower values → focus on very fine texture details (good for high-res, noisy EM).
Higher values → capture coarser patterns, ignore tiny fluctuations.

If the mask looks very noisy, try increasing this slightly.

`edge_sigma`¶

Standard deviation of the Gaussian used in the edge / gradient filter. This controls the spatial scale of edges that contribute to the mask.

Small values (e.g. 0.5) → respond to fine edges and thin structures.
Larger values → emphasise broader transitions, ignore tiny speckles.

If you’re missing thin tissue at boundaries, decrease this a bit; if you’re getting a lot of edge noise, increase it.

`min_region_size`¶

Minimum size (in pixels/voxels) of a connected component to keep in the final mask. Everything smaller is dropped as noise.

Increase this if you see lots of tiny specks of “tissue” in resin.
Decrease it if you actually care about very small islands of brain tissue.

This should be tuned relative to your image size and downsampling; 250 is a reasonable starting point for typical subvolumes.

Outputs¶

`mask`¶

Binary mask (same shape as volume): 1 for “texture consistent with tissue we care about”, 0 for background / non-brain.

`confidence`¶

Float array with a continuous score per pixel/voxel (e.g. how strongly the local texture looks like “brain”).

Useful if you want to:

tweak thresholds later without recomputing everything,
visualise “soft” tissue likelihood,
combine this with other signals (e.g. U-Net logits) before making a hard mask.

3D U-Net (MONAI) for brain vs non-brain¶

Key design points¶

The ML path uses a 3D U-Net implemented in MONAI to predict a single-channel foreground mask:

1 ≈ brain / VNC tissue
0 ≈ non-brain (resin, trachea, artefacts, etc.)

Everything is trained in voxel / pixel space (we do not encode voxel size explicitly).

Architecture¶

The model is a fairly standard, compact 3D U-Net:

from monai.networks.nets import UNet

model = UNet(
    spatial_dims=3,
    in_channels=1,
    out_channels=1,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
)

spatial_dims=3 → full 3D convolutions.
Input: single-channel EM volume (grayscale).
Output: single-channel logit volume (brain foreground probability after sigmoid).

Depth and channels are deliberately modest so the model can train and infer on a single GPU / workstation.

Training¶

Data and transforms:¶

EM volumes and masks are loaded from TIFF/Zarr.
A custom Dataset yields patches with keys like 'image' and 'mask'.
MONAI / TorchIO transforms handle:
- channel handling and intensity scaling,
- simple spatial augmentation (flips, 90° rotations, zoom),
- optional Gaussian noise.
Data is explicitly split into train, validation, and test sets.

Loss: DiceCELoss with sigmoid=True, suitable for binary foreground prediction from a single logit channel.

Optimiser: Adam with a conservative learning rate 1e-4.

Inference¶

Patch-based test inference¶

test_model(model, test_loader, save_dir="./outputs/runX", device=device)

Loops over a test_loader of patches.
Saves predicted patches and corresponding ground-truth masks (TIFF).
Reports average Dice and Jaccard across the test set.

Whole-volume sliding-window inference¶

run_inference_whole_volume(
    model,
    data_path,
    output_path,
    patch_size=(64, 64, 64),
    overlap=16,
    device="cuda:0",
)

Loads the full 3D EM volume (TIFF or Zarr).
Applies simple MONAI intensity preprocessing (scaling / normalisation).
Tiles the volume into overlapping 3D patches (patch_size), runs the model on each, and stitches them back together with overlap-handling.
Writes the predicted mask volume back to disk as TIFF or Zarr.

A variant of this is also provided with a smaller patch size (e.g. (32, 32, 32) and larger overlap) to trade off memory vs smoothness.

Operational Guidance¶

Tuning texture-based masking¶

Start with the default parameters:
- block_size = 64
- texture_sigma = 1.0
- edge_sigma = 0.5
- min_region_size = 250
If the mask is too noisy:
- increase texture_sigma
- increase min_region_size
If you are missing thin tissue:
- decrease edge_sigma
- slightly decrease min_region_size

Always inspect a few slices of volume, mask, and confidence overlays to make sure the behaviour makes sense.

Using the ML model¶

Use the texture-based mask as a quick check: if it completely fails, you likely have a very different domain and should re-examine your input.
For new datasets:
- Run the trained U-Net on a subset.
- Post-process the predictions.
Overlay EM + mask and visually check:
- tissue boundaries,
- trachea / artefacts,
- any systematic failure modes (e.g. over-masking near edges).
If performance degrades significantly:
- annotate a small ROI in the new dataset,
- fine-tune the existing model rather than training from scratch.

Future Work / Open Questions¶

Multi-class tissue labelling Extend beyond binary brain vs non-brain to explicitly label trachea, neuropil vs soma, etc., for more fine-grained control of downstream modules.

Domain adaptation Explore semi-supervised or unsupervised domain adaptation to handle new EM modalities with minimal additional labels.