Design Choices¶

This page explains the decisions behind Synful for synapse detection, alternatives considered, trade-offs, and how the current implementation is organised.

Problem / Goal¶

Synful is aimed at a specific synapse problem:

Given a volume EM dataset and point annotations of synapses, automatically identify synaptic partners (pre-post pairs) with high accuracy and at scale.

In other words, we want a method that:

takes volume EM as input (e.g. adult Drosophila datasets),
works from sparse point annotations, not dense cleft masks,
produces directed synaptic edges (who talks to whom),
and can be applied to very large volumes (whole-brain FAFB-scale data).

Synful is based on the approach introduced by Buhmann et al., Nature Methods 2021, "Automatic detection of synaptic partners in a whole-brain Drosophila electron microscopy dataset".

Core idea¶

Synful implements a multi-task U-Net-style encoder-decoder that jointly:

Localises post-synaptic sites (a segmentation-like task), and
Predicts a direction vector field from each post-synaptic site toward its pre-synaptic partner (a regression-like task).

This multi-task setup allows a single model to learn:

where synapses are, and
who they connect to,

with relatively simple post-processing to convert outputs into a graph of synaptic partners.

Alternatives Considered¶

1. Cleft-only segmentation¶

Idea: Train a network purely for synaptic cleft segmentation (binary cleft masks) and derive partners by matching cleft voxels to neuron segments.

Pros
- Conceptually simple.
- Many existin`g cleft segmentation models.
Cons
- Requires dense voxel-wise cleft annotation, which is slow and expensive.
- Partner assignment becomes a separate, sometimes heuristic-heavy step.

Buhmann et al. showed that a partner-centric formulation (post + vector field) can directly learn partner assignment from point annotations, reducing reliance on full cleft masks.

2. Detection-only (points without partner vectors)¶

Idea: Predict only pre and post points (or small blobs) and handle partner assignment entirely via geometry or heuristics.

Pros
- Simpler output space.
- Lightweight models possible.
Cons
- All partner information must be recovered in a separate post-processing step.
- May be harder to resolve ambiguous multi-partner synapses in dense synaptic regions.

Synful instead bakes partner direction into the network output via a vector field from post to pre, so the model learns partner structure directly.

3. Unified panoptic synapse models¶

Idea: Train larger, panoptic-style models that output full synaptic masks + partners in one shot.

Pros
- Potentially more expressive.
- May integrate cleft, pre, and post masks simultaneously.
Cons
- More complex training and architecture design.
- Heavier to run at whole-brain scale.

Synful instead focuses on a more compact multi-task U-Net that still scales to whole-brain adult fly data.

Decision¶

Multi-task U-Net¶

We adopt Buhmann et al.'s multi-task U-Net-style network:

Input: EM patches (with optional neuron segmentation).
Outputs:
- post-synaptic site localisation (segmentation),
- vector field pointing from post to pre.

Important

You can train Synful in singular modes too, to just predict pre synapses and post synapses. However, we use the joint prediction model to skip heavy post-processing.

This:

uses point annotations as supervision (converted to training targets),
allows joint learning of detection + partner assignment in a single model,
and keeps post-processing relatively simple (follow the vectors and attach to pre).

TensorFlow reference implementation + PyTorch reimplementation¶

In Catena, Synful is provided in two flavours:

TensorFlow (TF) implementation
- Based on the original Funke Lab Synful code.
- Supports:
  - inference with the pretrained Buhmann et al. models, and
  - training from scratch on new datasets.
PyTorch reimplementation
- A refactored version for:
  - easier use on newer CUDA stacks,
  - eager execution and easier debugging.
- Marked as under active development; currently being tested on both isotropic FIBSEM and anisotropic TEM datasets.

CREMI-style HDF5 IO¶

The TensorFlow implementation expects data in CREMI-style .hdf format, consistent with the original Synful setup.

Catena provides:

example conversion scripts to convert other invertebrate and vertebrate datasets into this format:
convert_wasp_to_CREMI.py , convert_megalopta_bee_to_cremi.py to convert external data into the expected format.

Trade-offs¶

Accuracy vs annotation cost
- Using a partner-centric multi-task model allows training from point annotations rather than dense clefts, which significantly lowers annotation burden.
- The trade-off is that the model must learn more structure (both detection and partner vectors), which can make training more involved than pure cleft segmentation.
TensorFlow vs PyTorch
- The TensorFlow implementation is closest to the original and ships with pretrained models, but can be less convenient on modern stacks.
- The PyTorch reimplementation offers better debuggability and integration with modern tooling, but is currently work in progress and under active testing.
Anisotropic vs isotropic EM
- The pretrained Synful models were trained on anisotropic CREMI-like data; they are sensitive to resolution and image quality.
- They can be run on isotropic datasets, but performance may degrade if resolution/contrast differ strongly; retraining or fine-tuning is often necessary.
Single joint model vs simpler baselines
- Synful's joint mask + vector formulation is powerful but more complex than lighter models such as SimpSyn (which only predicts pre/post masks and uses geometric pairing).
- This complexity is justified when high partner accuracy is required and one model should handle both detection and assignment.

Implementation Notes¶

TensorFlow - pretrained models¶

Inference with Buhmann et al.'s pretrained networks can be run using scripts under:

tensorflow/pretrained/train

These networks are suitable for anisotropic datasets similar to CREMI/FAFB.

They can be applied to isotropic EM, but we explicitly warn that:

they are sensitive to resolution and image quality,
even with "matching" resolutions, performance may not be as expected.

TensorFlow - training from scratch¶

Training Synful from scratch on your own datasets is supported under:

tensorflow/train_from_scratch

Instructions are embedded in the local readme.md.

Input is expected in CREMI HDF5 format:

If your data is not in that format, the example script
convert_wasp_to_CREMI.py
shows how to convert a micro-wasp dataset into CREMI-style .hdf.

Synful can be run with or without neuron segmentation:

with segmentation: model can use segment context as guidance when the spherical masks are created on top of point annotations (e.g., not to cross neuron boundaries),
without segmentation: model relies purely on EM context.

PyTorch reimplementation¶

The PyTorch code lives under:

synapse_detection/synful/pytorch

It is marked as under active development:

currently being tested on both isotropic FIBSEM and anisotropic TEM datasets,
complete usage instructions will be released once reproducibility in code and performance is verified.

Operational Guidance¶

When to use Synful¶

Use Synful when:

you want joint synaptic partner detection (not just site detection),
you have or can produce CREMI-style .hdf volumes and annotations,
you are working with anisotropic EM similar to CREMI/FAFB and want a method tested at whole-brain scale.

For simpler or more resource-constrained scenarios, Catena also offers SimpSyn as a lighter-weight model.

Data preparation¶

Convert your data into CREMI HDF5 format:

follow the example scripts to understand required datasets and attributes.

Warning

Data conversion to CREMI format is a must.

Make sure your annotations match Synful's expectations:

coordinates for post-synaptic and pre-synaptic sites,
optional neuron segmentation.

Using pretrained models (TensorFlow)¶

Start from tensorflow/pretrained/train and follow the README there.
Begin with anisotropic datasets whose voxel size and contrast are close to the original training data (CREMI/FAFB).

For isotropic data:

treat pretrained models as a sanity check, not a final solution;
inspect predictions visually to judge whether retraining is necessary.

Training from scratch (TensorFlow)¶

Follow the instructions in tensorflow/train_from_scratch/readme.md.

Ensure:

the train/validation split reflects your target distribution,
you monitor performance on held-out regions or neurons, not just on training ROIs.

PyTorch (when available)¶

Once usage instructions are finalised, the PyTorch version will likely become the preferred implementation for new projects, due to:

easier debugging,
better integration with modern CUDA/PyTorch tooling.

Future Work / Open Questions¶

Stabilise and document the PyTorch implementation¶

The main near-term goal is to provide a feature-complete PyTorch port of Synful with:

reproducible training scripts,
clear example configs for isotropic FIBSEM and anisotropic TEM,
and benchmarks on several datasets.

Resolution-robust training¶

Pretrained models are currently tuned for specific resolutions. Exploring:

multi-resolution training,
or explicit encoding of voxel size,

could improve robustness to new datasets.

Comparison and integration with simpler detectors¶

Synful's joint model is powerful but heavier than simpler approaches like SimpSyn. Systematic comparisons across datasets (see preprint) and use-cases will help clarify:

when the extra complexity is warranted,
and how to best combine outputs (e.g. use SimpSyn for site detection, Synful for partner refinement).

Tighter integration with proofreading tools¶

Buhmann et al. released CIRCUITMAP, a CATMAID add-on for interactive circuit reconstruction with Synful predictions.

In Catena, there is room to:

streamline pushing Synful predictions into CATMAID / CAVE,
and close the loop between automated detection and human proofreading.