Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation

Yuan-Ting Hu*, Jia-Bin Huang**, Alex Schwing*

University of Illinois Urbana-Champaign*, Virginia Tech**


Unsupervised video segmentation plays an important role in a wide variety of applications from object identification to compression. However, to date, fast motion, motion blur and occlusions pose significant challenges. To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and edge cues. Our approach leads to significantly better initial foreground-background estimates and their robust as well as accurate diffusion across time. We evaluate our proposed algorithm on the challenging DAVIS, SegTrack v2 and FBMS-59 datasets. Despite the usage of only a standard edge detector trained on 200 images, our method achieves state-of-the-art results outperforming deep learning based methods in the unsupervised setting. We even demonstrate competitive results comparable to deep learning based methods in the semi-supervised setting on the DAVIS dataset.



Paper [Link]

Supplementary [Download]

Poster [Download]

Precomputed segmentaion [DAVIS]


  author = {Hu, Yuan-Ting and Huang, Jia-Bin and Schwing, Alexander G.},
  title = {{Unsupervised Video Object Segmentation Using Motion Saliency-Guided Spatio-Temporal Propagation}},
  booktitle = {European Conference on Compute Vision},
  year = {2018}


    • Evaluation on DAVIS dataset [1]
    • Evaluation on Segtrack v2 dataset [2]
    • Evaluation on FBMS dataset [3]


[1] Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proc. CVPR (2016)

[2] Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proc. ICCV (2013)

[3] Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. PAMI (2014)