This paper (SSD: Single Shot MultiBox Detector) presents an object detection method using a single deep neural network. The model, SSD takes one single shot to detect multiple objects within the image. The model discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. During prediction the network generates scores for each object category present in each default box. The network combines predictions from multiple feature maps with different resolutions, which helps it to conveniently handle objects of various sizes. SSD eliminates object proposal generation and subsequent pixel or feature resampling stages and covers all computation in a single network. This makes SSD easy to train and simple to integrate into systems that requires detection component. Experimental results on different datasets confirm that SSD has competitive accuracy to methods which make use of object proposal step. Compared to other single stage methods, SSD is faster, has much better accuracy, and provides a unified framework for both training and inference.
Fig. 1 SSD real time Implementation.
The SSD is a purely convolutional neural network (CNN) that is organized into three parts –
The paper demonstrates two variants of the model called the SSD300 and the SSD512. The suffixes represent the size of the input image. Although the two networks differ slightly in the way they are constructed, they are in principle the same. The SSD512 is just a larger network and results in marginally better performance.
SSD model with input image size 300x300 (SSD300) is more accurate than Fast R-CNN.
Fig.4 PASCAL VOC2007 test detection results.
Fig.5 PASCAL VOC20012 test detection results.
Fig.6 COCO test-dev2015 detection results.
Fig.7 Results on multiple datasets when image expansion data augmentation trick is used.
Following steps followed to build, train, and test the SSD model on different datasets.
2. Test results on VOC2012 test dataset: 75.1 mAP, against the 75.8 mAP reported in the paper.
Fig.9 Different methods with mAP.