SEmantic Salient Instance Video (SESIV) dataset consists of 84 videos, which are originally from the DAVIS-2017 dataset, with 185 semantic salient instances and 29 categories. Particularly, the training set consists of 58 videos and the testing set consists of 26 videos. Each video frame has different ground-truth labels for various segmentation tasks (e.g., region label, instance label, and semantic label). Our SESIV annotations are built on top of DAVIS annotations, which provide pixel-wise instance labels, by selecting salient instances and adding instance categories from MS-COCO.
Video
Region Label
Instance Label
Semantic Label
If you use SESIV Dataset, please cite following publications: