Neurotransmitter Classification¶

Short overview of what this module does and links to usage.

Install & Usage: See the module's README and scripts in the GitHub repository for the most up-to-date instructions.
Design Choices: See Design Choices for the "why" behind the "what".

What are neurotransmitters?¶

Neurotransmitters are chemical messengers released at synapses that transmit signals from one cell to another (for example, from neuron to neuron). They can be broadly excitatory or inhibitory, depending on how they affect the post-synaptic cell.

Once we have a structural connectivity graph from synapse detection, neurotransmitters effectively give that graph its functional sign:

an edge can be excitatory or inhibitory,
and we know in which direction information flows along that edge.

In other words, neurotransmitters turn a bare structural graph into a signed, directed circuit.

Curating Neurotransmitter Data¶

Neurotransmitters are often grouped into two broad categories:

Excitatory
Inhibitory

Within these groups are specific transmitter types such as acetylcholine, GABA, glutamate, serotonin, dopamine, octopamine, and many others.

A common working assumption in circuit modelling is that neurons predominantly express one main neurotransmitter type, following Dale's Principle (also referred to as Dale's Law). Under this view, a neuron is treated as primarily excitatory or inhibitory, which lets us assign a consistent sign to all of its outgoing synapses.

Within Catena, neurotransmitter classification is done on top of a reconstructed connectome:

neurons and synapses are first identified,
synapses are associated with their pre- and post-synaptic partners,
neurotransmitter identity is then inferred at pre-synaptic sites so that each edge in the graph can be labelled as excitatory or inhibitory.

For ground-truth, we leverage public datasets, in particular curated data from the adult fly. These provide us with neurons whose neurotransmitter identity has been manually or experimentally established.

Our curation strategies, label sources, and preprocessing steps are described in detail here:

Neurotransmitter curation pipeline

Neurotransmitter Classification with Synister within Catena¶

This is a very initial reimplementation of the Nil Eckstein's Synister project. It has been restructured based on Synister's dev branch updates by Diane Adjavon; HHMI Janelia to work with local datasets that have been carefully curated from publicly available adult fly brain datasets. These local datasets are much smaller in size, with the biggest dataset containing around 1000 examples for each of the major neurotransmitters: acetylcholine, serotonin, dopamine, glutamate, gaba, octopamine, and tyramine.

Important

This re-implementation will slightly deviate from the original one as it will include multiple architectures besides the VGG. It will continue to use Gunpowder and Daisy for data loading and task scheduling.

Synapse classification — **Figure 1. Schematic of the modified lightweight 3D ResNet-18 architecture for neurotransmitter classification.** The network accepts a 3D input volume (e.g., presynaptic sites) and processes it through an initial 3 x 3 x 3 convolution layer. This is followed by four residual stages, each containing two residual blocks. Feature map dimensions are downsampled by a stride of 2 at the beginning of each stage, while channel width progressively increases from 12 to 96. Skip connections (curved arrows) allow gradient flow across blocks. Global average pooling is applied prior to a fully connected layer used for final classification.

Directory Structure¶

The project is organized as follows:

.
├── config/
│   ├── config.py           # Main configuration for training
│   └── config_predict.py   # Configuration for prediction
├── data_utils/
│   └── pre_process/
│       └── split_data.py   # Script to create train/test splits
├── engine/
│   ├── predict/
│   │   └── predict_3d.py   # Core prediction logic
│   └── train/
│       └── train_3d.py     # Core training logic
│   └── post/
│       └── evaluate_3d.py     # Result evaluation
├── add_ons/
│   └── gp/
|       └── mongo_source.py     # Custom Gunpowder node for MongoDB
├── models/
│   ├── resnet3d.py         # 3D ResNet model definition
│   └── vgg3d.py            # 3D VGG model definition
├── predicter.py            # Launcher for prediction runs
├── trainer.py              # Launcher for training runs
└── readme.md

Getting Started¶

Prerequisites¶

Conda (for managing the Python environment)
Access to a raw data volume (Zarr or N5 format) / HDF5 files containing synapse locations in the CREMI format.
Optionally, a MongoDB instance for large-scale data sourcing synapse pre-synapse points.

Installation¶

Clone the repository:

git clone <catena-repository-url>
cd catena
git checkout dev
cd neurotransmitter_classification

Create and activate the Conda environment:

conda env create -n synister -f http://github.com/Mohinta2892/catena/tree/dev/neurotransmitter_classification/conda_env/[choose a file that suits your requirements based on if you are running on local workstation or RHEL slurm cluster]
conda activate synister

TODO: Install dependencies: via pip: (Note: You will need to create a requirements.txt file based on your project's specific dependencies, including gunpowder, torch, yacs, pandas, sklearn, tqdm, h5py, and pymongo.)
```
pip install -r requirements.txt
```

Step 1: Data Preparation (Optional)¶

If you wish to create a reproducible train/test split from your HDF5 files, you can use the provided script. This is recommended for comparing models fairly.

Run the split script: The script will scan a directory of HDF5 files, shuffle them, and create a .pkl file containing lists of files for training and validation.
```
python scripts/create_split.py \
    --data_dir /path/to/your/data_root \
    --split_ratio 0.9 \
    --output_file ./data_splits/my_split.pkl
```
* --data_dir: Should point to a directory containing subfolders for each neurotransmitter class (e.g., /path/to/data_root/gaba/, /path/to/data_root/acetylcholine/, etc.).

Step 2: Training¶

Training is launched via the trainer.py script, which uses config/config.py for its settings.

Configure Training: Open config/config.py and adjust the settings. Key parameters include:
- DATA.USE_SPLIT_FILE: Set to True to use the .pkl file from Step 1, or False to use all data found in the DATA.DATA_DIR_PATH.
- DATA.SPLIT_FILE: Path to your .pkl split file.
- DATA.HOME, DATA.DATA_DIR_PATH, DATA.BRAIN_VOL: Paths to locate your data.
- TRAIN.MODEL_TYPE: Choose between RESNET or VGG.
- LOGGING: Set paths for logs, snapshots, and checkpoints.
Launch Training:
```
python trainer.py -c config/config.py
```

Step 3: Prediction [Partially Tested - Further testing under works]¶

Prediction is launched via predicter.py and configured using config/config_predict.py. It is highly flexible and can source synapse locations from multiple backends.

Configure Prediction: Open config_predict.py and set the parameters for your prediction run.

Set the checkpoint:

_C.PREDICT.CHECKPOINT = "/path/to/your/checkpoints/model_checkpoint_100000.pt"

Set the raw data container:

_C.RAW_DATA.CONTAINER = "/path/to/raw_data.zarr"
_C.RAW_DATA.DATASET = "volumes/raw/s0"

Choose a Data Source Method: Set _C.DATA_SOURCE.METHOD to one of the following:
- 'pkl': To predict on the validation set from a split file.
```
_C.DATA_SOURCE.PKL_FILE_PATH = "./data_splits/my_split.pkl"
```
- 'directory': To predict on all HDF5 files in a specific directory (for unlabeled data).
```
_C.DATA_SOURCE.DIRECTORY_PATH = "/path/to/unlabeled_synapses/"
```
- 'mongo': To predict on synapse locations stored in a MongoDB database (for very large scale inference).
```
_C.DATA_SOURCE.MONGO.DB_NAME = "synapse_database"
_C.DATA_SOURCE.MONGO.DB_HOST = "mongodb://user:pass@host:27017/"
_C.DATA_SOURCE.MONGO.COLLECTION = "synapses_to_predict"
```

Launch Prediction: Run the predicter.py script, pointing it to your configuration file.
```
python predicter.py config/config_predict.py
```
The predictions will be saved as a CSV file in the directory specified by _C.PREDICT.OUTPUT_DIR.

Step 4: Evaluation¶

If your prediction run included ground truth labels (e.g., when using a .pkl split file), you can evaluate the model's performance.

Run the evaluation script: Point the script to the CSV file generated during the prediction step.
```
python scripts/evaluate.py predictions/predictions.csv \
    --output_file predictions/evaluation_report.txt
```
This will print an evaluation report including overall accuracy, per-class metrics (precision, recall, F1-score), and a confusion matrix to the console and save it to the specified output file.