Fast and Accurate Unknown Object Instance Segmentation through Error-Informed Refinement

Author

Seunghyeok Back, Sangbeom Lee, Kangmin Kim, Joosoon Lee, Sungho Shin, Jemo Maeng, and Kyoobin Lee

If you have any questions, please feel free to contact Seunghyeok Back: shback@gm.gist.ac.kr

Abstract

Accurate perception of unknown objects is essential for autonomous robots, particularly when manipulating novel items in unstructured environments. However, existing unknown object instance segmentation (UOIS) methods often have over-segmentation and under-segmentation problems, resulting in inaccurate instance boundaries and failures in subsequent robotic tasks such as grasping and placement. To address this challenge, this article introduces INSTA-BEER, a fast and accurate model-agnostic refinement method that enhances the UOIS performance. The model adopts an error-informed refinement approach, which first predicts pixel-wise errors in the initial segmentation and then refines the segmentation guided by these error estimates. We introduce the quad-metric boundary error, which quantifies pixel-wise true positives, true negatives, false positives, and false negatives at the boundaries of object instances, effectively capturing both fine-grained and instance-level segmentation errors. Additionally, the Error Guidance Fusion (EGF) module explicitly integrates error information into the refinement process, further improving segmentation quality. In comprehensive evaluations conducted on three widely used benchmark datasets, INSTA-BEER outperformed state-of-the-art models in both accuracy and inference time. Moreover, a real-world robotic experiment demonstrated the practical applicability of our method in improving the performance of target object grasping tasks in cluttered environments.

The code will be released soon. Stay tuned.. [github]

Introduction: What is INSTA-BEER?

INSTAnce Boundary Error Estimation and Refinement (INSTA-BEER) is a novel error-informed refinement method that addresses both fine-grained and instance-level errors in unknown object instance segmentation (UOIS) models. It aims to improve the robustness of unknown object segmentation and grasping. INSTA-BEER predicts the quad-metric boundary error, a pixel-wise representation of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) errors in the instance boundaries of the initial segmentation. This error estimation provides specific refinement guidance, identifying regions to be modified. The Error Guidance Fusion (EGF) module integrates these error estimates into the refinement process, leading to improved segmentation accuracy. 

This figure showcases the effectiveness of INSTA-BEER in refining unknown object instance segmentation across various domains. It illustrates how INSTA-BEER predicts quad-metric boundary errors and performs error-informed segmentation refinement. Initial segmentation may contain over- and under-segmentation issues, which can lead to subsequent grasp failures. INSTA-BEER effectively resolves these issues and improves upon various initial segmentation methods on three widely used benchmarks, achieving state-of-the-art performance with a fast inference time of approximately 0.1 seconds on an RTX3090 GPU and Intel6248R CPU. This fast speed makes INSTA-BEER suitable for real-time applications in robotic perception and manipulation. 

INSTA-BEER Architecture

This figure provides an overview of the INSTA-BEER architecture, highlighting its key components: the initial segmentation feature extractor, error estimator, and error-informed refiner with the Error Guidance Fusion (EGF) module. It takes RGB-D and Initial Segmentation as input and first predicts the quad-metric boundary error of the initial segmentation. Then, the error guidance fusion (EGF) module leverages the estimated errors for targeted and effective refinement, ultimately refining the segmentation. 

Quad-metric Boundary Error

One of our key contributions is the introduction of the quad-metric boundary error. This error representation captures pixel-wise TP, TN, FP, and FN errors at the instance boundaries, providing a comprehensive view of both fine-grained and instance-level segmentation errors. By focusing on the instance boundaries, the quad-metric boundary error enables INSTA-BEER to effectively address over- and under-segmentation issues and guide the refinement process. Also, it provide explicit guidance for the refinement process, indicating that the pixels are accurately segmented (TP, TN), inaccurately segmented (FP), or missing object pixels (FN). By predicting this error, our model could produce accurate refined segmentation by resolving the over- and under-segmentation.

Correcting Over- and Under-Segmentation 

INSTA-BEER effectively corrects over- and under-segmentation issues in initial segmentation results. This is demonstrated on the OSD, OCID, and WISDOM datasets. The bottom figure compares the initial UCN segmentation results and the refined segmentations produced by INSTA-BEER on these datasets, showcasing the impact of INSTA-BEER in correcting over- and under-segmentation issues. 

The bottom figure presents a qualitative comparison between INSTA-BEER and other refinement models using the same initial segmentation (UCN). Although BPR and CascadePSP enhance boundary details, they lack the ability to add or delete instances. RICE successfully splits and merges incorrectly segmented instances but sometimes fails to merge instances correctly due to its reliance on a sampling strategy that may not cover all instance graphs. In contrast, INSTA-BEER consistently refines segmentation accurately. 

[Video] Robotic Application: Target Object Grasping

We demonstrate a practical, real-world robotic application of INSTA-BEER by improving the success rate of target object grasping. This video showcases the real-world application of INSTA-BEER in robotic target object grasping from a cluttered bin containing unknown objects, given only ten template images per object. We compare the performance with and without INSTA-BEER refinement, based on initial segmentations from UOAIS-Net and UCN. INSTA-BEER improves UCN's segmentation success rate from 65% to 82% and grasp success rate from 59% to 75%. Similarly, UOAIS-Net's segmentation success rate rises from 80% to 86%, and grasp success rate increases from 70% to 76%. The results highlight how INSTA-BEER enhances robotic grasping performance by addressing over-/under-segmentation issues and improving performance in cluttered bin environments.