Video Semantic Salient Instance

SESIV Dataset

SEmantic Salient Instance Video (SESIV) dataset consists of 84 videos, which are originally from the DAVIS-2017 dataset, with 185 semantic salient instances and 29 categories. Particularly, the training set consists of 58 videos and the testing set consists of 26 videos. Each video frame has different ground-truth labels for various segmentation tasks (e.g., region label, instance label, and semantic label). Our SESIV annotations are built on top of DAVIS annotations, which provide pixel-wise instance labels, by selecting salient instances and adding instance categories from MS-COCO.


Region Label

Instance Label

Semantic Label


Semantic Instance - Salient Object (SISO)


If you use SESIV Dataset, please cite following publications:


  • Trung-Nghia Le, Akihiro Sugimoto, "Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation", Winter Conference on Applications of Computer Vision (WACV), US, 2019. [PDF] [Poster]