Neurotransmitter Classification¶
Short overview of what this module does and links to usage.
- Install & Usage: See the module's README and scripts in the GitHub repository for the most up-to-date instructions.
- Design Choices: See Design Choices for the "why" behind the "what".
What are neurotransmitters?¶
Neurotransmitters are chemical messengers released at synapses that transmit signals from one cell to another (for example, from neuron to neuron). They can be broadly excitatory or inhibitory, depending on how they affect the post-synaptic cell.
Once we have a structural connectivity graph from synapse detection, neurotransmitters effectively give that graph its functional sign:
- an edge can be excitatory or inhibitory,
- and we know in which direction information flows along that edge.
In other words, neurotransmitters turn a bare structural graph into a signed, directed circuit.
Curating Neurotransmitter Data¶
Neurotransmitters are often grouped into two broad categories:
- Excitatory
- Inhibitory
Within these groups are specific transmitter types such as acetylcholine, GABA, glutamate, serotonin, dopamine, octopamine, and many others.
A common working assumption in circuit modelling is that neurons predominantly express one main neurotransmitter type, following Dale's Principle (also referred to as Dale's Law). Under this view, a neuron is treated as primarily excitatory or inhibitory, which lets us assign a consistent sign to all of its outgoing synapses.
Within Catena, neurotransmitter classification is done on top of a reconstructed connectome:
- neurons and synapses are first identified,
- synapses are associated with their pre- and post-synaptic partners,
- neurotransmitter identity is then inferred at pre-synaptic sites so that each edge in the graph can be labelled as excitatory or inhibitory.
For ground-truth, we leverage public datasets, in particular curated data from the adult fly. These provide us with neurons whose neurotransmitter identity has been manually or experimentally established.
Our curation strategies, label sources, and preprocessing steps are described in detail here:
Neurotransmitter Classification with Synister within Catena¶
This is a very initial reimplementation of the Nil Eckstein's Synister project. It has been restructured based on Synister's dev branch updates by Diane Adjavon; HHMI Janelia to work with local datasets that have been carefully curated from publicly available adult fly brain datasets. These local datasets are much smaller in size, with the biggest dataset containing around 1000 examples for each of the major neurotransmitters: acetylcholine, serotonin, dopamine, glutamate, gaba, octopamine, and tyramine.
Important
This re-implementation will slightly deviate from the original one as it will include multiple architectures besides the VGG.
It will continue to use Gunpowder and Daisy for data loading and task scheduling.
Directory Structure¶
The project is organized as follows:
.
├── config/
│ ├── config.py # Main configuration for training
│ └── config_predict.py # Configuration for prediction
├── data_utils/
│ └── pre_process/
│ └── split_data.py # Script to create train/test splits
├── engine/
│ ├── predict/
│ │ └── predict_3d.py # Core prediction logic
│ └── train/
│ └── train_3d.py # Core training logic
│ └── post/
│ └── evaluate_3d.py # Result evaluation
├── add_ons/
│ └── gp/
| └── mongo_source.py # Custom Gunpowder node for MongoDB
├── models/
│ ├── resnet3d.py # 3D ResNet model definition
│ └── vgg3d.py # 3D VGG model definition
├── predicter.py # Launcher for prediction runs
├── trainer.py # Launcher for training runs
└── readme.md
Getting Started¶
Prerequisites¶
- Conda (for managing the Python environment)
- Access to a raw data volume (Zarr or N5 format) / HDF5 files containing synapse locations in the CREMI format.
- Optionally, a MongoDB instance for large-scale data sourcing synapse pre-synapse points.
Installation¶
-
Clone the repository:
git clone <catena-repository-url> cd catena git checkout dev cd neurotransmitter_classification -
Create and activate the Conda environment:
conda env create -n synister -f http://github.com/Mohinta2892/catena/tree/dev/neurotransmitter_classification/conda_env/[choose a file that suits your requirements based on if you are running on local workstation or RHEL slurm cluster] conda activate synister -
TODO: Install dependencies: via pip: (Note: You will need to create a
requirements.txtfile based on your project's specific dependencies, includinggunpowder,torch,yacs,pandas,sklearn,tqdm,h5py, andpymongo.)pip install -r requirements.txt
Step 1: Data Preparation (Optional)¶
If you wish to create a reproducible train/test split from your HDF5 files, you can use the provided script. This is recommended for comparing models fairly.
-
Run the split script: The script will scan a directory of HDF5 files, shuffle them, and create a
.pklfile containing lists of files for training and validation.*python scripts/create_split.py \ --data_dir /path/to/your/data_root \ --split_ratio 0.9 \ --output_file ./data_splits/my_split.pkl--data_dir: Should point to a directory containing subfolders for each neurotransmitter class (e.g.,/path/to/data_root/gaba/,/path/to/data_root/acetylcholine/, etc.).
Step 2: Training¶
Training is launched via the trainer.py script, which uses config/config.py for its settings.
-
Configure Training: Open
config/config.pyand adjust the settings. Key parameters include:DATA.USE_SPLIT_FILE: Set toTrueto use the.pklfile from Step 1, orFalseto use all data found in theDATA.DATA_DIR_PATH.DATA.SPLIT_FILE: Path to your.pklsplit file.DATA.HOME,DATA.DATA_DIR_PATH,DATA.BRAIN_VOL: Paths to locate your data.TRAIN.MODEL_TYPE: Choose betweenRESNETorVGG.LOGGING: Set paths for logs, snapshots, and checkpoints.
-
Launch Training:
python trainer.py -c config/config.py
Step 3: Prediction [Partially Tested - Further testing under works]¶
Prediction is launched via predicter.py and configured using config/config_predict.py. It is highly flexible and can source synapse locations from multiple backends.
-
Configure Prediction: Open
config_predict.pyand set the parameters for your prediction run.- Set the checkpoint:
_C.PREDICT.CHECKPOINT = "/path/to/your/checkpoints/model_checkpoint_100000.pt" - Set the raw data container:
_C.RAW_DATA.CONTAINER = "/path/to/raw_data.zarr" _C.RAW_DATA.DATASET = "volumes/raw/s0" - Choose a Data Source Method: Set
_C.DATA_SOURCE.METHODto one of the following:'pkl': To predict on the validation set from a split file._C.DATA_SOURCE.PKL_FILE_PATH = "./data_splits/my_split.pkl"'directory': To predict on all HDF5 files in a specific directory (for unlabeled data)._C.DATA_SOURCE.DIRECTORY_PATH = "/path/to/unlabeled_synapses/"'mongo': To predict on synapse locations stored in a MongoDB database (for very large scale inference)._C.DATA_SOURCE.MONGO.DB_NAME = "synapse_database" _C.DATA_SOURCE.MONGO.DB_HOST = "mongodb://user:pass@host:27017/" _C.DATA_SOURCE.MONGO.COLLECTION = "synapses_to_predict"
- Set the checkpoint:
-
Launch Prediction: Run the
predicter.pyscript, pointing it to your configuration file.The predictions will be saved as a CSV file in the directory specified bypython predicter.py config/config_predict.py_C.PREDICT.OUTPUT_DIR.
Step 4: Evaluation¶
If your prediction run included ground truth labels (e.g., when using a .pkl split file), you can evaluate the model's performance.
-
Run the evaluation script: Point the script to the CSV file generated during the prediction step.
This will print an evaluation report including overall accuracy, per-class metrics (precision, recall, F1-score), and a confusion matrix to the console and save it to the specified output file.python scripts/evaluate.py predictions/predictions.csv \ --output_file predictions/evaluation_report.txt