PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivarance in Clustering


We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and objectcentric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to lowlevel visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of segmenting both things and stuff categories without any hyperparameter tuning or task-specific pre-processing. Our method largely outperforms existing baselines on COCO and Cityscapes with +17.5 Acc. and +4.5 mIoU. We show that PiCIE gives a better initialization for standard supervised training.





Qualitative Result



author = {Cho, Jang Hyun and Mall, Utkarsh and Bala, Kavita and Hariharan, Bharath},

title = {PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering},

booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

month = {June},

year = {2021},

pages = {16794-16804}



This work was supported by DARPA Learning with Less Labels program (HR001118S0044), CHS-1617861, CHS-1513967, CHS-1900783, and CHS-1930755.


Please feel free to contact Jang Hyun Cho for any questions or discussions!