Detection

Deep learning approach - SSD

Single Shot Detector (SSD) is a single-stage object detector which gives real-time performance. It is based on convolutional neural network (CNN) structure.

As opposed to region-based methods, which first generate region proposals and then detect the object of each proposal, SSD takes one single shot to detect multiple objects. The higher speed didn't come at the cost of reducing the accuracy - SSD, with 76.9% mAP at 22 FPS, outperforms Faster R-CNN (73.2% mAP at 7 FPS) and YOLOv1 (63.4 mAP at 45 FPS).

MobileNet V1 is used as a base of SSD. Additional convolutional layers which progressively decrease in size are added to the end of the base network.

Training details

Tensorflow Object Detection API + NVidia CUDA Toolkit

Model: ssd_mobilenet_v1_coco

Transfer learning: Model is pre-trained on the COCO dataset

the assumption is that lower layers (the ones closer to the inputs) have learned general features (lines, edges..) which are not specific to the COCO dataset.

Dataset:

synthetic dataset proceduraly generated in Blender (540 images)
manually labeled real images dataset (43 images + augmentation)
mainly trained on red bell peppers

Detection

Deep learning approach - SSD

Training details

good detection

bad detection