Commonsense Scene Graph-based Target Localization for Object Search

(Submit to IROS 2024)

Wenqi Ge1, Chao Tang1, Hong Zhang1,*

[1] Shenzhen Key Laboratory of Robotics and Computer Vision, SUSTech, Shenzhen, China.

[*] corresponding author: Hong Zhang.

Abstract: Household robots are often tasked with searching frequently moved objects for sub-tasks, making object search a challenging task. For household robots with access to pre-built maps identifying stationery items, accurately locating a general area for movable target objects can greatly enhance search efficiency. In similar contexts, previous methods depended on statistical correlations or used graph neural networks (GNN) to learn from images, aiming to roughly locate the target for search guidance. However, those methods are respectively constrained by environmental specificity and incomplete spatial information.  To address these limitations,  we propose \textbf{CSG-OS}. This novel commonsense scene graph-based object search framework is enhanced with commonsense knowledge via a large language model (LLM), which aligns target localization with human-like reasoning to inspire object search better. We train CSG-base target localization (CSG-TL), a core component of CSG-OS, on ScanNet dataset and evaluate for zero-shot performance cross on both ScanNet and AI2THOR simulator. Besides, the CSG-OS framework also achieves the SOTA performance on the AI2Thor simulator. We also deploy CSG-OS on the Jackal robot to validate its efficacy.

Overview:

Pipeline:

CSG-based object search (CSG-OS) pipeline. Firstly, the user queries the target object, which is then encoded with LLM-derived commonsense knowledge to form the target node Vt. Following this, the CSG is then constructed from a pre-built map of stationery items, incorporating Vt for target localization through CSG-TL, detailed in sec. IV-A. Thirdly, nodes correlated with the target are clustered based on their location position and the likelihood of their predicted correlation, establishing a set of candidate search points. Finally, the robot navigates to the first candidate point to search for the target. If found, the task is done successfully. Otherwise, the robot updates the CSG by newly detected objects and repeats the search steps until it finds the target or exceeds the max setting steps.

Process:

Mapping

Building

CSG

Target Localization

CSG updating

Authors:

Citation:

@misc{ge2024commonsense,

      title={Commonsense Scene Graph-based Target Localization for Object Search}, 

      author={Wenqi Ge and Chao Tang and Hong Zhang},

      year={2024},

      eprint={2404.00343},

      archivePrefix={arXiv},

      primaryClass={cs.RO}

}