Design Choices¶
This page explains the decisions we made, alternatives considered, trade-offs, and future work.
Problem / Goal¶
What outcome are we optimising for?
We want a reliable, lightweight way to generate brain vs non-brain masks for large EM volumes. These masks are used upstream (or downstream, however you see it) of other Catena modules (LSDs, synapses, mitochondria, neurotransmitters) to:
- restrict heavy models to brain / VNC tissue only,
- avoid wasting GPU/CPU on resin and artefacts,
- reduce false positives (e.g. "neuron segmentations", "synapses" on resin),
- standardise which voxels are “in play” across different pipelines.
The goal is not to solve fine semantic segmentation of all tissues, but to provide a clean, binary mask that says:
- “this voxel belongs to brain / ventral nerve cord”, versus
- “this is resin, trachea, or something we don’t plan to segment”.
We also want the solution to be:
- easy to run on a single workstation,
- minimally dependent on exotic libraries,
- and robust enough to generalise to new EM volumes with limited re-tuning (less hyper-parameter optimisation).
Alternatives Considered¶
1. Global intensity thresholding¶
Idea: Use simple thresholds on intensity (or a histogram-based method like Otsu) to separate bright resin from darker tissue.
-
Pros
- Extremely simple to implement.
- No training data required.
- Very fast.
-
Cons
- Fails when tissue and resin have overlapping intensity ranges.
- Sensitive to staining / imaging differences across datasets.
- Blind to local texture; easily confuses trachea, artefacts, and brain.
We found this too brittle for practical use beyond tiny toy examples.
2. DoG-only / edge-based masking¶
Idea: Use a Difference of Gaussians (DoG) and gradient magnitude to highlight “structured” regions and suppress smooth resin; threshold the result.
-
Pros
- Still classical CV; no labels needed.
- Picks up fine structure better than raw intensity.
- Works as a quick sanity-check baseline.
-
Cons
- DoG alone produces noisy and fragmented masks.
- Struggles in regions where resin has structure (e.g. artefacts, cracks).
- Cannot explicitly distinguish trachea and other non-brain structures.
DoG on its own was not sufficient; it motivated the texture-based approach instead.
3. Texture / structure analysis (current CV baseline)¶
Idea: Combine DoG with local texture and structure analysis (e.g. variance, structure tensor) on small blocks to separate “biological-looking” texture from smoother background.
-
Pros
- Much better than DoG alone for separating tissue from pure resin.
- No training labels required.
- Gives a quick, interpretable initial mask and confidence map.
-
Cons
- Still purely local; no notion of semantics like “brain vs trachea”.
- Requires some parameter tuning per dataset (block size, sigmas, region size).
- Will include non-brain tissues that have similar texture to brain.
We keep this as a baseline and backup path, but not as the primary mask generator.
4. Heavier ML models (large U-Nets / transformers)¶
Idea: Use a deep 3D U-Net (or even transformers) with extensive augmentation, multi-scale context, voxel-size encoding, etc.
-
Pros
- Potentially very strong performance.
- Can model complex tissue variation and imaging artefacts.
- Easier to extend to multi-class labelling (brain, trachea, resin, etc.).
-
Cons
- More complex to train and tune.
- Requires a lot of labeled data for training.
- Higher compute and memory requirements.
- Harder to document and maintain as a “simple module” for general users.
This felt overkill for a binary, upstream mask whose main role is to gate other modules.
5. No dedicated mask (let every module handle resin itself)¶
Idea: Skip this module; let each downstream model learn to ignore resin on its own.
-
Pros
- Fewer components and no extra training step.
- Simplifies the pipeline on paper.
-
Cons
- Every module re-learns the same “ignore resin” behaviour (no segments or synapses are generated on resin parts).
- Wasted computation on non-brain regions in every stage.
- Harder to debug what each model is actually seeing.
Given Catena’s modular design, a shared mask is the more efficient and interpretable choice.
Decision¶
What we chose and why.
We adopted a two-path design:
- A classical CV path based on texture-based masking, which:
- is cheap and label-free,
- provides an initial “biological vs background” mask,
-
outputs a confidence map that can be combined with learned models.
-
A simple 3D U-Net (MONAI) trained for brain vs non-brain semantic segmentation on downsampled volumes, which:
- uses minimal but targeted augmentation to generalise across EM volumes,
- is trained in pixel space (no voxel-size encoding) to keep things simple,
- produces masks that transfer reasonably well to unseen EM data.
The classical path is treated as a baseline and diagnostic tool; the U-Net path is the recommended default for production masks.
Trade-offs¶
- Accuracy vs complexity
- The 3D U-Net is intentionally small and simple. We trade a bit of peak accuracy for ease of training and reproducibility.
-
The texture-based method is less accurate but more explainable and robust when labels are scarce.
-
Data / label requirements
- Texture-based masking needs no labels.
-
The U-Net does require semantic labels for at least one volume, but only at a coarse brain vs non-brain level. We use a downsampled version of the volume to speed up training (usually at 50um
scale 5isotropic from FIBSEM). -
Generalisation
- Texture-based masking is sensitive to resolution and noise, but not to annotation style.
-
The U-Net generalises surprisingly well to unseen volumes but will eventually drift if the imaging domain is very different (different species, staining, etc.).
-
Performance / compute
- Texture-based masking is CPU-friendly and works fine on modest hardware. Masks often generated on Mac M1 Pro with 16GB RAM in a few minutes.
- The 3D U-Net benefits from a GPU but is still modest in size; it can run on a single workstation without heroic resources (and even on CPU-only machines like a Mac M1 Pro). Training typically takes a few hours on a single GPU.
Implementation Notes¶
The module currently has two main components:
- Texture-based EM masking (classical CV)
- 3D U-Net brain vs non-brain model (MONAI)
1. Texture-based EM masking¶
Core function:
mask, confidence = em_texture_masking(
volume,
block_size=64,
texture_sigma=1.0,
edge_sigma=0.5,
min_region_size=250
)
Inputs¶
volume¶
3D EM volume (or 2D slice stack) in which we want to separate “brain-like texture” from background resin / junk.
block_size¶
Size of the local window (in pixels/voxels) used for computing texture statistics.
- Smaller blocks → more detailed, sensitive to small variations but also to noise.
- Larger blocks → smoother, more robust, but can miss tiny islands of tissue.
In practice, 64 is a good starting point for downsampled FIBSEM; scale it roughly with your in-plane resolution.
texture_sigma¶
Standard deviation of the Gaussian used in the texture filter (e.g. for smoothing features such as local variance or structure tensor).
- Lower values → focus on very fine texture details (good for high-res, noisy EM).
- Higher values → capture coarser patterns, ignore tiny fluctuations.
If the mask looks very noisy, try increasing this slightly.
edge_sigma¶
Standard deviation of the Gaussian used in the edge / gradient filter. This controls the spatial scale of edges that contribute to the mask.
- Small values (e.g.
0.5) → respond to fine edges and thin structures. - Larger values → emphasise broader transitions, ignore tiny speckles.
If you’re missing thin tissue at boundaries, decrease this a bit; if you’re getting a lot of edge noise, increase it.
min_region_size¶
Minimum size (in pixels/voxels) of a connected component to keep in the final mask. Everything smaller is dropped as noise.
- Increase this if you see lots of tiny specks of “tissue” in resin.
- Decrease it if you actually care about very small islands of brain tissue.
This should be tuned relative to your image size and downsampling; 250 is a reasonable starting point for typical subvolumes.
Outputs¶
mask¶
Binary mask (same shape as volume): 1 for “texture consistent with tissue we care about”, 0 for background / non-brain.
confidence¶
Float array with a continuous score per pixel/voxel (e.g. how strongly the local texture looks like “brain”).
Useful if you want to:
- tweak thresholds later without recomputing everything,
- visualise “soft” tissue likelihood,
- combine this with other signals (e.g. U-Net logits) before making a hard mask.
3D U-Net (MONAI) for brain vs non-brain¶
Key design points¶
The ML path uses a 3D U-Net implemented in MONAI to predict a single-channel foreground mask:
1≈ brain / VNC tissue0≈ non-brain (resin, trachea, artefacts, etc.)
Everything is trained in voxel / pixel space (we do not encode voxel size explicitly).
Architecture¶
The model is a fairly standard, compact 3D U-Net:
from monai.networks.nets import UNet
model = UNet(
spatial_dims=3,
in_channels=1,
out_channels=1,
channels=(16, 32, 64, 128, 256),
strides=(2, 2, 2, 2),
num_res_units=2,
)
-
spatial_dims=3 → full 3D convolutions.
-
Input: single-channel EM volume (grayscale).
-
Output: single-channel logit volume (brain foreground probability after sigmoid).
Depth and channels are deliberately modest so the model can train and infer on a single GPU / workstation.
Training¶
Data and transforms:¶
-
EM volumes and masks are loaded from TIFF/Zarr.
-
A custom Dataset yields patches with keys like 'image' and 'mask'.
-
MONAI / TorchIO transforms handle:
-
channel handling and intensity scaling,
-
simple spatial augmentation (flips, 90° rotations, zoom),
-
optional Gaussian noise.
-
-
Data is explicitly split into train, validation, and test sets.
Loss: DiceCELoss with sigmoid=True, suitable for binary foreground prediction from a single logit channel.
Optimiser: Adam with a conservative learning rate 1e-4.
Inference¶
Patch-based test inference¶
test_model(model, test_loader, save_dir="./outputs/runX", device=device)
-
Loops over a test_loader of patches.
-
Saves predicted patches and corresponding ground-truth masks (TIFF).
-
Reports average Dice and Jaccard across the test set.
Whole-volume sliding-window inference¶
run_inference_whole_volume(
model,
data_path,
output_path,
patch_size=(64, 64, 64),
overlap=16,
device="cuda:0",
)
-
Loads the full 3D EM volume (TIFF or Zarr).
-
Applies simple MONAI intensity preprocessing (scaling / normalisation).
-
Tiles the volume into overlapping 3D patches (patch_size), runs the model on each, and stitches them back together with overlap-handling.
-
Writes the predicted mask volume back to disk as TIFF or Zarr.
A variant of this is also provided with a smaller patch size (e.g. (32, 32, 32) and larger overlap) to trade off memory vs smoothness.
Operational Guidance¶
Tuning texture-based masking¶
-
Start with the default parameters:
-
block_size = 64 -
texture_sigma = 1.0 -
edge_sigma = 0.5 -
min_region_size = 250
-
-
If the mask is too noisy:
-
increase texture_sigma -
increase min_region_size
-
-
If you are missing thin tissue:
-
decrease edge_sigma -
slightly decrease min_region_size
-
Always inspect a few slices of volume, mask, and confidence overlays to make sure the behaviour makes sense.
Using the ML model¶
-
Use the texture-based mask as a quick check: if it completely fails, you likely have a very different domain and should re-examine your input.
-
For new datasets:
-
Run the trained U-Net on a subset.
-
Post-process the predictions.
-
-
Overlay EM + mask and visually check:
-
tissue boundaries,
-
trachea / artefacts,
-
any systematic failure modes (e.g. over-masking near edges).
-
-
If performance degrades significantly:
-
annotate a small ROI in the new dataset,
-
fine-tune the existing model rather than training from scratch.
-
Future Work / Open Questions¶
Multi-class tissue labelling Extend beyond binary brain vs non-brain to explicitly label trachea, neuropil vs soma, etc., for more fine-grained control of downstream modules.
Domain adaptation Explore semi-supervised or unsupervised domain adaptation to handle new EM modalities with minimal additional labels.