Detection
Deep learning approach - SSD
Single Shot Detector (SSD) is a single-stage object detector which gives real-time performance. It is based on convolutional neural network (CNN) structure.
As opposed to region-based methods, which first generate region proposals and then detect the object of each proposal, SSD takes one single shot to detect multiple objects. The higher speed didn't come at the cost of reducing the accuracy - SSD, with 76.9% mAP at 22 FPS, outperforms Faster R-CNN (73.2% mAP at 7 FPS) and YOLOv1 (63.4 mAP at 45 FPS).
MobileNet V1 is used as a base of SSD. Additional convolutional layers which progressively decrease in size are added to the end of the base network.
Training details
Tensorflow Object Detection API + NVidia CUDA Toolkit
Model: ssd_mobilenet_v1_coco
Transfer learning: Model is pre-trained on the COCO dataset
the assumption is that lower layers (the ones closer to the inputs) have learned general features (lines, edges..) which are not specific to the COCO dataset.
Dataset:
synthetic dataset proceduraly generated in Blender (540 images)
manually labeled real images dataset (43 images + augmentation)
mainly trained on red bell peppers