LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity


In this work, we introduce LEAD, an approach to discover landmarks from an unannotated collection of category-specific images. Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image, which are further used to learn landmarks in a semi-supervised manner. While there have been advances in self-supervised learning of image features for instance-level tasks like classification, these methods do not ensure dense equivariant representations. The property of equivariance is of interest for dense prediction tasks like landmark estimation. In this work, we introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We follow a two-stage training approach: first, we train a network using the BYOL objective which operates at an instance level. The correspondences obtained through this network are further used to train a dense and compact representation of the image using a lightweight network. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations while also improving generalization across scale variations.


If you find our work helpful in your research, please cite our work:


author = {Karmali, Tejan and Atrishi, Abhinav and Harsha, Sai Sree and Agrawal, Susmit and Jampani, Varun and Babu, R. Venkatesh},

title = {LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity},

booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},

month = {January},

year = {2022},

pages = {623-632}



This project is licenced under an [MIT License].


If you have any queries, please get in touch via email : tejank10@gmail.com