Unsupervised learning of foreground object detection

Abstract

Unsupervised learning poses one of the most difficult challenges in computer vision today. The task has an immense practical value with many applications in artificial intelligence and emerging technologies, as large quantities of unlabeled videos can be collected at relatively low cost. In this paper, we address the unsupervised learning problem in the context of detecting the main foreground objects in single images. We train a student deep network to predict the output of a teacher pathway that performs unsupervised object discovery in videos or large image collections. Our approach is different from published methods on unsupervised object discovery. We move the unsupervised learning phase during training time, then at test time we apply the standard feed-forward processing along the student pathway. This strategy has the benefit of allowing increased generalization possibilities during training, while remaining fast at testing. Our unsupervised learning algorithm can run over several generations of student-teacher training. Thus, a group of student networks trained in the first generation collectively create the teacher at the next generation. In experiments our method achieves top results on three current datasets for object discovery in video, unsupervised image segmentation and saliency detection. At test time the proposed system is fast, being one to two orders of magnitude faster than published unsupervised methods.

Papers

Our preliminary work was published at ICCV2017: I. Croitoru, S.V. Bogolin, M. Leordeanu, Unsupervised learning from video to detect foreground objects in single images, ICCV 2017. [paper]

I.Croitoru, S.V. Bogolin, M. Leordeanu, Unsupervised Learning of Foreground Object Segmentation, International Journal of Computer Vision (IJCV) 2019. [pdf]

Iteration 1 - Qualitative results of LowRes-Net

LowRes-Net is the initial architecture that appeared at ICCV 2017. Below you can find qualitative results on the Object Discovery in Internet images dataset (Rubinstein et al, "Unsupervised Joint Object Discovery and Segmentation in Internet Images", CVPR 2013). We present our results on the 100 images subsets for the airplane, car and horse classes. In the photos you can see the input image and our resulting segmentation (in the right side of the input image).

Iteration 1 - Videos obtained with LowRes-Net

LowRes-Net is the initial architecture that appeared at ICCV 2017. Below you can find qualitative results on the YouTube Objects dataset (Prest et al, "Learning Object Class Detectors from Weakly Annotated Video", CVPR 2012). Each video contains the input frame with our fitted bounding box, the output of the teacher method (VideoPCA - Stretcu et al, "Multiple frames matching for object discovery in video", BMVC 2015) and our softmask.

Iteration 1 - Video obtained with Multi-Net

Iteration 2 - Qualitative results

Below you can find some qualitative results of our method and also a comparison between our models.

Code

You can find the code here.

Team

Acknowledgements

This work was supported by UEFISCDI, under project PN-III-P4-ID-ERC-2016-0007, PN-III-P2-2.1-PED-2016-1842 and PN-III-P1-1.2-PCCDI-2017-0734.