Luo Z, Mishra A, Achkar A, Eichel J, Li S-Z, Jodoin P-M, “Non-Local Deep Features for Salient Object Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). [pdf]
Saliency detection aims to highlight the most relevant objects in an image. Methods using conventional models struggle whenever salient objects are pictured on top of a cluttered background while deep neural nets suffer from excess complexity and slow evaluation speeds. In this paper, we propose a simplified convolutional neural network which combines local and global information through a multi-resolution 4×5 grid structure. Instead of enforcing spacial coherence with a CRF or super-pixels as is usually the case, we implemented a loss function inspired by the Mumford-Shah functional which penalizes errors on the boundary. We trained our model on the MSRA-B dataset, and tested it on six different saliency benchmark datasets. Results show that our method is on par with the state-of-the-art while reducing computation time by a factor of 18 to 100 times, enabling near real-time, high performance saliency detection.
Figure 1. Architecture of our 5*4 grid-CNN network for saliency object detection.
Figure 2. Saliency maps produced by the GS, MR, wCtr*, LEGS, BSCA, MDF, MC and DCL methods compared to our NLDF method. The NLDF maps provides clear salient regions and exhibit good uniformity as compared to the saliency maps from the other deep learning methods (LEGS, MC, MDF and DCL). Our method is also more robust to background clutter than the none-deep-learning methods (GS, MR, wCtr* and BSCA).
Table 2. Quantitative performance of our model on six benchmark datasets compared with the GS, MR, wCtr*, LEGS, BSCA, MDF, MC and DCL models. The latter four are deep learning methods and the former are not. The F and MAE metrics are defined in the text.
Figure 3. Precision-recall curves for our model compared to GS, MR, wCtr*, LEGS, BSCA, MDF, MC and DCL evaluated on the MASR-B, HKU-IS, DUT-OMRON, PASCAL-S, ECSSD and SOD benchmark datasets. Our NLDF model can deliver state-of-the-art performance on all six datasets.