Home / Project Blogs /Shape Prior to Post-Process Model Predictions

Goal

Here we present a method to isolate regions in MA-Net predictions using the bounding boxes computed by YOLOv5. We provide a self-supervised solution to post-process model predictions without prior knowledge.

Method

Now that we have a YOLOv5 model trained to detect the fornix's bounding box, we can impose these predictions on the MA-Net predictions of the Fornix for level 21 and 25.

Results

Here we show the YOLOv5 bounding boxes detected for both test images. We also compare the intersection-over-union metric before and after post-processing is applied.

Best Matching Regions

Given we are looking for a region of (988µm)^2 for level 21 and (374µm)^2 for level 25, we convert these to pixel equivalent measurements using the scale factor 1µm/px, the scale of the images in our dataset. Thus, we used a region of (988px)^2 for level 21 and (374px)^2 for level 25. The figure below illustrates the best matching region for each test image corresponding to level 21 and 25.

Level 21

IoU w Threshold @ 0.5 before post-processing = 0.37

Level 25

IoU w Threshold @ 0.5 before post-processing = 0.49

Masking & Segmentation

With the best matching regions found, we can mask out anything outside each region.

Level 21

Level 25

Segmentation

At this point we can threshold the post-processed model prediction to get the final segmentation mask.

Level 21

IoU w Threshold @ 0.5 after post-processing = 0.54

Level 25

IoU w Threshold @ 0.5 after post-processing = 0.75

Does the detected region agree with the ground truth label?

As an additional observation, we can see that the best matching region from the model predictions align well with the ground truth labels.

Level 21

Level 25

Discussion

This method significantly improves the intersection-over-union metric for both test images. The combination of reducing false positives by using larger input sizes in MA-Net and extracting expected predictions by applying a region-based filtering has yielded the best results we have obtained by far. Nonetheless, we should discuss assumptions of using these methods. A discussion about MA-Net, such as model size, is provided in the MA-Net experiment section, so here we focus on assumptions we made about MA-Net outputs when applying region filtering as well as assumptions about the filtering itself.

The first assumption we made is that for any new experimental image, the user will have to know the corresponding level in the atlas to determine the appropriate region size and occupancy. Due to many factors, such as plane of sectioning of brain tissue, finding a corresponding level may not be trivial. However, region filtering is a modular step, so a pipeline can be developed that uses the idea of region filtering without explicitly specifying the level or even brain region.

Moreover, we are assuming the predictions of MA-Net are good enough that measures like region size and occupancy can detect the correct region. It could be easy to fool the region-filtering algorithm with predictions that make no sense in terms of the data. So really, the model should produce predictions that are as close to the final brain delineation for the region-filtering to work. Here, "close to final" means to have as high recall and precision as possible so that the occupancy measure serves as a good metric to match regions. Since we have not attempted this with other brain regions, it may be possible that MA-Net, as it is configured right now, does not produce "close to final" delineations.

Lastly, our choice of using a square region, as opposed to regions with similar proportions to a brain region's bounding box, allows for detection of a region in more orientations than is established on the brain atlas. Further, using the longer axis of the brain region to define the sides of the square region also prevents any overcropping of the brain region in case of rotation. A possible issue would be that if we are preventing overcropping, we may be allowing more false positives. However, because the model predictions are more precise near the brain region, false positives are not a huge issue. Further areal filtering can be done to remove any debris or extract the largest connected component inside the best matching region.