Semantic Co-segmentation in Videos

Yi-Hsuan Tsai Guangyu Zhong Ming-Hsuan Yang

University of California, Merced Dalian University of Technology

Overview of the proposed algorithm. Given a collection of videos without providing category labels, we aim to segment semantic objects. First, a set of tracklets is generated for each video, and each tracklet is associated with a predicted category illustrated in different colors (e.g., blue represents the dog and red represents the cow). Then a graph that connects tracklets as nodes from all videos is constructed for each object category. We formulate it as the submodular optimization problem to co-select tracklets that belong to true objects (depicted as glowing nodes), and produce final semantic segmentation results.


Abstract

Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation. Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.


Downloads

"Semantic Co-segmentation in Videos", Yi-Hsuan Tsai, Guangyu Zhong, Ming-Hsuan Yang, European Conference on Computer Vision (ECCV), 2016

[Paper] [Supplementary] [Video] [GitHub]


BibTex

@inproceedings{Tsai_ECCV_2016,
author = {Y.-H. Tsai and G. Zhong and M.-H. Yang},
booktitle = {European Conference on Computer Vision (ECCV)},
title = {Semantic Co-segmentation in Videos},
year = {2016}}