One-Shot Transfer of Affordance Regions? AffCorrs!
Denis Hadjivelichkov, Sicelukwanda Zwane, Marc Peter Deisenroth, Lourdes Agapito, Dimitrios Kanoulas
Abstract
Given a single reference image of an object with annotated affordance regions, can we segment semantically corresponding parts within a target scene?
We investigate this question with AffCorrs, an unsupervised model that combines the useful properties of pre-trained DINO-ViT's image descriptors and cyclic correspondences. AffCorrs is able to find corresponding affordances both for intra- and inter-class one-shot semantic part segmentation. This task is more difficult than supervised alternatives, but enables future work such as learning affordances via imitation and assisted teleoperation.
Video
Model
The overall system diagram: The support descriptors belonging to the query area are grouped into query centroids. The salient target image descriptors are grouped into target descriptors. Cyclic correspondence is enforced by matching the query centroids to the target centroids (forward matching), and target centroids to support descriptor image (backward matching). A score is computed for each centroid in the target. The centroid scores are mapped to the descriptors belonging to each grouping. Finally, each pixel is determined as foreground (or not) by the CRF, which compares the scores with a baseline energy level.
Results
Our evaluation on a subset of UMD images shows that the method can be applied to both inter- and intra-class pairs, and outperform the state-of-the-art One-shot instance segmentation method, as well as the co-part segmentation method that was used to develop AffCorrs.
Benefiting from the pre-trained backbone, AffCorrs can be used to transfer affordance regions from novel, unseen before objects with one-shot, without any training or fine-tuning.
Citation
title = {{One-Shot Transfer of Affordance Regions? AffCorrs!}},
author = {Hadjivelichkov, Denis and Zwane, Sicelukwanda and Deisenroth, Marc and Agapito, Lourdes and Kanoulas, Dimitrios}, booktitle = {{Proceedings of The 6th Conference on Robot Learning (CoRL)}}, pages = {550--560}, year = {2023}, editor = {Liu, Karen and Kulic, Dana and Ichnowski, Jeff}, volume = {205}, series = {Proceedings of Machine Learning Research}, month = {14--18 Dec}}
Authors
RPL Group, UCL
Related Work
Amir, Shir and Gandelsman, Yossi and Bagon, Shai and Dekel, Tali. Deep ViT Features as Dense Visual Descriptors, 2021