Design Choices¶

This page explains the decisions behind SimpSyn for synapse detection, alternatives considered, trade-offs, and how to use it in practice.

Problem / Goal¶

Synapse detection is about more than finding "something dark" in EM; we want a method that:

works with point annotations (pre/post sites) rather than dense cleft masks,
is lightweight and simple to train,
runs well on 3D vEM volumes (often anisotropic),
and generalises across invertebrate/vertebrate datasets (e.g. different species / preparations).

SimpSyn is our answer to this: an ultra-lightweight synapse detector that rethinks synapse detection as a two-channel segmentation task (pre and post masks) with simple geometric post-processing.

Alternatives Considered¶

1. Cleft segmentation with voxel-wise masks¶

Idea: Annotate full synaptic clefts voxel-wise and train a network to segment cleft voxels.

Pros
Very detailed supervision.
Well-established in the literature for synapse detection.
Cons
Cleft masks are tedious and expensive to annotate.
Voxel-wise labels do not scale well across multiple datasets or species.

Given Catena's emphasis on reproducible and scalable ground-truth generation, relying only on dense cleft masks is not attractive.

2. Synful-style joint mask + regression models¶

Synful (another module in Catena) jointly learns:

a mask segmentation and
a regression task (e.g. partner direction),

from point annotations. This is powerful but more complex.

Pros
Richer outputs (joint learning).
Encodes more geometric information in a single network.
Cons
Heavier models and training.
More complicated objective and post-processing.
Higher compute cost at inference.

SimpSyn explicitly deviates from this: it aims to be simpler and more computationally efficient, especially when you want something easy to train and deploy on multiple datasets.

3. Detection-style point predictors (no masks)¶

Another option is to train pure point detectors (e.g. heatmaps or object centres) and skip masks entirely.

Pros
Very lightweight outputs.
Easy to store and combine with partner assignment.
Cons
No shape information; harder to reason about ambiguous synapses.
More sensitive to thresholding and NMS heuristics.

Given that small, local masks are still quite cheap to predict (and create on-the-fly from literal point coordinates), SimpSyn chooses to keep a segmentation view, but make it as light as possible.

Decision¶

SimpSyn makes three key design choices:

Reformulate synapse detection as a 3D segmentation task
Predict two output channels:
- Channel 0: pre-synaptic mask
- Channel 1: post-synaptic mask
Each channel is trained to segment local 3D regions around annotated pre/post points.
Use a 3D Residual U-Net backbone
A relatively small 3D ResUNet is used as the core model.
This gives:
- strong performance on 3D EM volumes,
- good gradient flow (residual connections),
- and manageable compute cost.
Simple, geometry-based post-processing
From the two predicted masks:
- Connected components are used to isolate individual pre and post objects.
- Each post-synaptic component is paired with its nearest pre-synaptic component (nearest neighbour in 3D space).
The result is:
- a set of pre and post masks, and
- a partner assignment (pre->post) per synapse.

SimpSyn is implemented within BiaPy, so it can reuse BiaPy's data pipelines, configuration system, and training utilities.
We are also working fully integrating it within Catena.

Trade-offs¶

Simplicity vs expressiveness
By dropping the regression head (as in Synful) and focusing only on segmentation + nearest-neighbour pairing, SimpSyn is easier to train and debug.
The trade-off is that some nuanced geometric relationships must be captured via simple nearest-neighbour heuristics instead of a learned partner model.
Masks vs pure points
Predicting small masks instead of points costs a bit more in memory and compute, but:
- makes training more stable,
- allows visual inspection of the ultrastructure that "triggered" the detection,
- and provides richer context for later use (e.g. feature extraction).
Residual U-Net vs heavier architectures
The chosen 3D Residual U-Net is strong enough for synapse detection but intentionally not huge.
This keeps training and inference practical on typical lab GPUs, at the cost of not exploring very large backbones or transformer-based models.
Nearest-neighbour pairing vs learned partner assignment
Nearest-neighbour matching is simple and deterministic.
It can fail in very dense synaptic regions where pre/post separation is hard and multiple candidates are nearby, but is usually sufficient given good masks.

Implementation Notes¶

Important

This may change given that SimpSyn is still undergoing dev work.

From the user's perspective, SimpSyn has three conceptual stages:

Target generation from point annotations
For each annotated synapse:
- a 3D spherical region is drawn around the pre-synaptic point,
- and another around each post-synaptic point.
These spheres become local binary masks for the pre and post channels.
3D Residual U-Net training (via BiaPy)
Input: EM patches (3D) cropped around synapse regions.
Output: two-channel prediction (pre mask, post mask).
Loss: segmentation loss (e.g. Dice/CE combination as configured in BiaPy; see module README for details).
All training logic (patch sampling, augmentations, logging) is handled in the BiaPy framework; SimpSyn mainly provides:
- the custom dataset/label definition,
- the ResUNet model configuration,
- and post-processing scripts.
Post-processing and partner assignment
For each prediction volume:
- apply thresholding to obtain binary masks for pre and post channels,
- run connected component labelling to extract individual pre and post components,
- for each post component, find the nearest pre component (e.g. centroid distance) and assign them as partners.
The final output is:
- a set of pre + post objects,
- and a table / graph of synaptic connections.

Install & usage are inherited from BiaPy:

Follow BiaPy's installation docs:
https://biapy.readthedocs.io/en/latest/get_started/installation.html
Then use the SimpSyn-specific configs and scripts in Catena's synapse_detection/simpsyn directory for training and prediction. This is under works. Full release is pending.

Operational Guidance¶

When to use SimpSyn (vs Synful)
- Use SimpSyn when:
  - you want a simple, segmentation-based detector with point annotations,
  - you care about compute efficiency and easier deployment,
  - and nearest-neighbour partner assignment is sufficient.
- Use Synful when:
  - you need richer joint outputs (mask + regression),
  - and you're willing to pay higher compute/training complexity.
Data preparation
- Ensure you have pre/post point annotations (not just cleft masks). You can use our conversion scripts to convert segments to points.
- Use the provided scripts to convert these into spherical masks for training.
- Start with conservative sphere radii; over-large spheres may cause overlaps and reduce specificity.
Training
- Start from the default BiaPy config for 3D segmentation and adapt:
  - patch size,
  - batch size,
  - learning rate,
    to your GPU memory.
- Monitor:
  - per-channel Dice/IoU for pre vs post,
  - and visual overlays of masks on EM slices to sanity-check.
Inference & post-processing
- Use sliding-window inference (as in BiaPy) for large volumes.
- Carefully tune:
  - probability thresholds for binarisation,
  - connected component size filters,
  - and the maximum distance allowed for pre-post pairing.
- Inspect a few regions with:
  - EM + predicted masks,
  - pre/post centroids,
  - and final partner assignments.
Validation & curation
- For high-value datasets, run critical regions through CATMAID or your preferred proofreading tool to:
  - tag correct/incorrect/uncertain,
  - and create a curated benchmark for comparing SimpSyn, Synful, or other methods.

Read more about SimpSyn in our pre-print: Towards Generalized Synapse Detection Across Invertebrate Species

Future Work / Open Questions¶

More robust partner assignment
Explore score-based or learned matching between pre/post components without sacrificing SimpSyn's simplicity.
Better handling of dense synaptic regions
Investigate how the segmentation + connected components pipeline behaves in very dense neuropil, and whether modest model or post-processing changes improve separation.
Cross-dataset generalisation
The associated preprint, Towards Generalized Synapse Detection Across Invertebrate Species, will inform how well SimpSyn transfers across species and preparations and what minimal fine-tuning is required.
Closer integration into Catena's graph-building tools
Streamline conversion of SimpSyn outputs into Catena's standard synapse tables and connectivity graphs, and unify evaluation across SimpSyn, Synful, and other detectors.