Home / Project Blogs /Cell Detection

Goal

The purpose of this work was to develop a cell detection program for future use. Specifically, we trained the deep learning model YOLOv5 on an existing dataset and deployed it within a graphical user interface. The end product is a custom-built tool to facilitate the collection of similar data for future experiments. This work will be presented as a poster at the Society for Neuroscience conference in the fall of 2022.

We include the abstract of the poster below for context, but the remainder of this post expands on the implementation of the tool. We discuss how we developed the main component, the object detector, and then explain the logic behind the graphical user interface.


Abstract

Subgroups of hindbrain neuronal populations are well documented to respond robustly to glycemic challenge. These populations, however, lay scattered within a vast expanse of neural tissue that requires time, labor, and expertise to analyze quantitatively. To reduce this labor but maintain quality control from the expertise, we developed an algorithm that employs the deep learning model, YOLOv5, to quickly and accurately annotate cells in epifluorescence photomicrographs. These cells include those that displayed rapid activation in association with an intravenous 2-deoxy-glucose challenge (250 mg/kg) relative to a saline-treated control group. The cells were identified using dual and/or triple-label immunohistochemistry for dopamine beta hydroxylase, choline acetyltransferase, and the cellular activation marker, phosphorylated-ERK1/2. The model was trained in-house on a subset of manual cell markings from both treatment groups in the locus ceruleus, nucleus of the solitary tract, and dorsal motor nucleus of the vagus to detect cellular positions. We reviewed the preliminary cell markings and manually adjusted the model predictions to remove false-positives and register cells that may have gone undetected by the program. The precision and recall of the model were evaluated so that it could reliably serve as a first-pass cell-locator for the reviewer of the data. This method will allow us to streamline our future data collection and analysis to map hindbrain chemoarchitecture and activated neuronal populations to an open-access rat brain atlas.

Raw Data Overview

Exporting Images and Cell Markings From AI Files

File Structure

  1. Data

    1. Level 51

      1. AI Files

      2. Images

        1. ChAT

        2. DBH

        3. pERK

      3. Labels

        1. ChAT

        2. DBH

        3. pERK

    2. Level 67

      1. Same as above

    3. Level 69

      1. Same as above

Exporting

  1. Images

    1. Isolate image for an individual peptide and "Export>Export As..."

    2. In the "Export" dialog box choose the appropriate level and peptide image folder. Select "PNG" format and "Use Artboards", then click "Export". Leave the AI file name as the PNG file name.

    3. In the "PNG Options" dialog box select resolution as "Screen (72 ppi)", Anti-aliasing as "None" and Background Color as "Black", then click "OK".

  2. Labels

    1. Isolate cell layer for corresponding image and "Export>Export As..."

    2. In the "Export" dialog box choose the appropriate level and peptide label folder. Select "SVG" format and "Use Artboards", then click "Export". Leave the AI file name as the SVG file name.

    3. In the "SVG Options" dialog box select Styling as "Inline Style", Font as "Convert To Outlines", Images as "Preserve", Object IDs as "Layer Names", Decimal as "4", and select "Minifiy" and "Responsive", then click "OK".

  3. Confirm image height and width match the height and width specified in the view box of the SVG file. (In macOS, right-click on image file and select "Get Info", and then open the SVG file with a text editor and search for "viewBox".) The screenshot below illustrates the confirmation of image and graphic dimensions.


Summary

  • Total number of levels: 3

  • Total number of Adobe Illustrator Files: 56

  • Total number of Image-SVG pairs: 143

    • pERK: 56

    • DBH: 54

    • ChAT: 33

  • Total Number of Labeled Cells: 6235

    • pERK: 2007

    • DBH: 2278

    • ChAT: 1950

All image files have a resolution of ~0.52 µm/pixel with an average height and width of 3K pixels. Only single-channel images were used for this work.

All SVG files contain the x, and y coordinates of each cell marked by a human.

SVG Labels to YOLOv5 Labels

YOLOv5 requires a specific format of labels for objects. Namely, for every object, YOLO requires the class, the x, and y center coordinates and the width and height, in that order. The labels take the form of a plain text file, where each line represents an object and a single text file contains all objects in an image. An example is shown below (borrowed from here). Note x, y, width, and height are normalized to the width and height of the image.

Since we only have one class, "cell", we can enter a zero for all objects in our dataset. We also have the x and y coordinates which we can extract from the SVG files. However, we do not have the width or height since our data was not labeled with bounding boxes but rather, points. Here we present two methods to derive the width and height for all 6235 cells.

Fixed Bounding Square

Our first approach involves applying a fixed bounding box to all the cells in the data. For this, we used existing measurements of cell lengths and averaged the longest axis of all cells to determine the square width and height. This process gave us a bounding box of 17.58µm or 17.58µm * (99pixels/51µm) = 34.13 pixels which we applied to all 6235 cells.

Learned Bounding Boxes

Our second approach involves training an auxiliary YOLOv5 model whose only task is to output a single bounding box given a crop with a cell in the center. Our hypothesis was YOLO would be able to quickly learn to produce reliable boxes, since the task involves producing a bounding box for images where a cell is exactly in the center 100% of the time, instead of detecting cells in larger spaces where cells may or may not exist.

To do this, we took the same images used to derive the cell measurements for the previous approach and randomly cropped ten cells. We used a cell's x y coordinate to center the crop and a height and width of 64 pixels, which is approximately double the average cell length and envelopes all cells within it. This resulted in a total of 694 crops with cells in them. The images below show an example of these crops.

Then, we quickly labeled all 694 crops using the makesense.ai tool specifically designed for object detection problems. A screenshot of the platform is shown below.

Lastly, we separated the 694 crops into a training and validation set with a 2:1 ratio to train the auxiliary YOLOv5 model (462 for training, 232 for validating). As a side note, we learned that using the images with their original resolution (64x64 pixels) resulted in bad predictions (completely useless really), and that resizing the crops to four times their size (256x256) resulted in near-perfect results. We tried different sizes but quadrupling the size gave the best results. But this is not the result section so... anyways. We ended up using this model to produce bounding boxes for all 6235 cells.

In the result section, we compare the performance of using a fixed bounding square and the learned bounding boxes.

Training Data

Now that we had labels in the proper format (either fixed or learned), we proceeded to prepare the data for the main task. We randomly separated our main dataset into a training and validation set, using the same 2:1 ratio. This separated our 143 images into 95 image-SVG pairs for training and 48 image-SVG pairs for validating. For all images, we sampled crops of 256x256 pixels. Then, to ensure a balanced dataset, we enforced certain conditions while sampling positive and negative samples.

We considered crops that contained annotated cells in them as positive samples. These were obtained by iterating through the images in a grid-like fashion and sampling all non-overlapping crops of 256x256 that contained labeled cells. Then, for negative samples, we couldn't just sample crops without annotated cells because our dataset is not fully labeled, meaning we could risk sampling a "negative" crop when in fact it has cells that have not been labeled. To overcome this, we drew polygons around all cells (labeled or unlabeled) on all 143 images. Then during the sampling, we would check if the crop contained the polygons; if it didn't we would sample it as a negative sample, otherwise, we'd skip it. Lastly, for each image, we sampled as many negative samples as there were positive samples, thus ensuring a balanced dataset. The figure below illustrates the sampling of negative and positive samples, in red and green respectively, including the cell annotations in green and the negative sampling regions overlayed with gray lines.

This process resulted in a total of 1691 training samples and 1437 validation samples, each with 50% positive and 50% negative samples. Finnaly, all crops were converted into grayscale images to remove the pseduocolor and allow the network to only focus on the signal of pixel intensity regardless of color.

YOLOv5 Model

We used the YOLOv5 implementation developed by Ultralytics. We tested 200 configurations, each with different image augmentations (and different strengths), model configurations, batch sizes, and optimizers. We ran these experiments on four NVIDIA GeForce RTX 2080 Ti graphics cards for a total duration of 20 hours. From all these, the best instance used Ultralytics' default augmentation parameters in "low-augmentations", the YOLOv5s (small) model configuration, a batch size of 32 images, and the Adam optimizer with an initial learning rate of 0.001. All other training parameters remained as default.

Results

Image Augmentation

Here we show the effects of the image augmentations performed during training. These are the augmentations that resulted in the best-performing model. Each square here represents an input image of size 256x256. At the same time, here show the difference in bounding boxes. On the left, we can see what labels (red boxes) look like with the fixed bounding squares, and on the right, we show the learned bounding boxes.

Training Examples for fixed bounding squares

Training Examples for learned bounding boxes

Here are the specific values for the augmentation parameters:

  • hsv_h: 0.015

  • hsv_s: 0.7

  • hsv_v: 0.4

  • degrees: 0.0

  • translate: 0.1

  • scale: 0.5

  • shear: 0.0

  • perspective: 0.0

  • flipud: 0.0

  • fliplr: 0.5

  • mosaic: 1.0

  • mixup: 0.0

  • copy_paste: 0.0

Training

Here we present the final training results of training with fixed bounding squares and learned bounding boxes. It is evident that doing the additional step of generating bounding boxes that are closer to the cell anatomy improves results in the main task of detecting cells.

Recommended Confidence Level

During inference time, the model provides a level of confidence for every bounding box prediction. This value tells you how confident the model is that a cell actually exists inside the bounding box. This can be a useful parameter when using the model in the deployment phase since it allows you to filter predictions based on a confidence level. Typically, the F1 Score is used to find the optimal confidence level since it measures the trade-off between false positives and false negatives. A higher F1 Score is better, and we can plot the F1 score versus the confidence level to reveal which confidence level yields the highest F1 score.

In our case, the highest F1 score is achieved when we filter out any model predictions that have a confidence level of less than 0.327. Now we can set this value as the recommended confidence level for future use.

Sample Predictions

Here we show some sample predictions side-by-side with the human label for comparison.

Model Predictions with their confidence level for 16 images

Human labels for 16 images as a comparison

Deployment

Now that we have a trained network, we can use it to extract preliminary cell counts in future experiments. Here we elaborate on the graphical user interface its features. We also explain an additional process in the pipeline that facilitates Colocalization Analysis.

Interface

For ease of use, we released the model in a web app graphical user interface. The web app can be downloaded from GitHub and installed on a personal computer and the code can be found here. It only uses the browser for interface purposes, but it does not require an internet connection and all files remain local.

As shown in the screenshot of the web app, the program allows users to change the desired confidence level as well as enable or disable colocalization analysis. By default, colocalization analysis will be performed when inputting multiple images. This assumes that the inputs belong to the same tissue section and have the same dimensions. This feature can be disabled, however, if the user needs to run object detection of multiple tissue sections without the need for colocalization.

Colocalization

If colocalization analysis is enabled, the program assumes the images are aligned and share the same dimensions. The program will run the object detection algorithm on each image, and then use the detections to run a colocalization analysis for all possible combinations. The program will then output an SVG for each image and each combination.

As an example, to run the colocalization of peptides pERK, ChAT, and DBH, the user would need to input an image for each peptide. The program will run the object detection algorithm on each image, and then use the detections to run colocalization analysis for pERK-ChAT, pERK-DBH, ChAT-DBH, and pERK-ChAT-DBH. The program will then output a total of seven SVG files.