Deep Interactive Object Selection

Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang

University of Illinois at Urbana-Champaign

Adobe Research



Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground and background distributions. In this paper, we present a novel deep learning based algorithm which has a much better understanding of objectness and thus can reduce user interactions to just a few clicks. Our algorithm transforms user provided positive and negative clicks into two Euclidean distance maps which are then concatenated with the RGB channels of images to compose (image, user interactions) pairs. We generate many of such pairs by combining several random sampling strategies to model user click patterns and use them to fine tune deep Fully Convolutional Networks (FCNs). Finally the output probability maps of our FCN 8s model is integrated with graph cut optimization to refine the boundary segments. Our model is trained on the PASCAL segmentation dataset and evaluated on other datasets with different object classes. Experimental results on both seen and unseen objects clearly demonstrate that our algorithm has a good generalization ability and is superior to all existing interactive object selection approaches.


We are the first to solve interactive segmentation in the framework of deep learning. Given an input image and user interactions, our algorithm first transforms positive and negative clicks (denoted as green dots and red crosses respectively) into two separate channels, which are then concatenated (denoted as ⊕) with the image’s RGB channels to compose an input pair to the FCN models. The corresponding output is the ground truth mask of the selected object.

(a) Input image and user interactions. (b) Output from our FCN model. (c) Refinement by Graph cut.

We also leverage Graph cut to refine the probability output from our FCN models. The optimization objective is composed of an unary term R(L) and a pairwise term B(L). We directly use the probability output from FCN as the unary term and use low-level features to comprise the properties of object boundaries.


We compare our method with state-of-the-art interactive methods on several benchmarks. The following figure shows the plots of mean Intersection Union accuracy v.s. the number of clicks. Our method requires the least amount of user effort and achieve better results on all the datasets.

The following figure shows some visual results. Please find more results in our paper.

Video Demos



Deep Interactive Object Selection. [paper] [supplementary materials] [Figure 4 data]

Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang

2016 IEEE Conference on Computer Vision and Pattern Recognition.



title={Deep interactive object selection},

author={Xu, Ning and Price, Brian and Cohen, Scott and Yang, Jimei and Huang, Thomas S},

booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},