Chang Chen Liang Lu* Lei Yang* Yinqiang Zhang Yizhou Chen Ruixing Jia Jia Pan*
The University of Hong Kong Centre for Transformative Garment Production
*Corresponding authors
Current exploration methods struggle to search for shops or restaurants in unknown open-world environments due to the lack of prior knowledge. Humans can leverage venue maps that offer valuable scene priors to aid exploration planning by correlating the signage in the scene with landmark names on the map. However, arbitrary shapes and styles of the texts on signage, along with multi-view inconsistencies, pose significant challenges for robots to recognize them accurately. Additionally, discrepancies between real-world environments and venue maps hinder the integration of text-level information into the planners. This paper introduces a novel signage-aware exploration system to address these challenges, enabling the robots to utilize venue maps effectively. We propose a signage understanding method that accurately detects and recognizes the texts on signage using a diffusion-based text instance retrieval method combined with a 2D-to-3D semantic fusion strategy. Furthermore, we design a venue map-guided exploration-exploitation planner that balances exploration in unknown regions using directional heuristics derived from venue maps and exploitation to get close and adjust orientation for better recognition. Experiments in large-scale shopping malls demonstrate our method's superior signage recognition performance and search efficiency, surpassing state-of-the-art text spotting methods and traditional exploration approaches.
We propose to leverage the textual information in a venue map to facilitate shop searching in unknown open-world environments. The robot localizes itself in the environment by recognizing and matching the texts on a shop sign to the venue map. Then the robot plans a direction to the next landmark `Briketenia'.
Our method first constructs a topological graph on a given venue map. Then, given the RGB-D image, the proposed signage understanding method recognizes the text on the signage and correlates it with the text set of venue map. Once localized on the venue map, the next landmark goal is inferred to guide the selection of frontiers. Throughout the process, our system balances exploration and exploitation to improve signage recognition accuracy and coverage efficiency.
Belows are some examples of the signage recognition during exploration. Red boxes highlight the texts of interest.
The authors would like to thank Zhongxuan Li, Yipeng Pan, and Yupu Lu for supporting the real-world experiments. The authors would also like to thank Yuecheng Liu for his useful comments on this work.
@ARTICLE{10878474,
author={Chen, Chang and Lu, Liang and Yang, Lei and Zhang, Yinqiang and Chen, Yizhou and Jia, Ruixing and Pan, Jia},
journal={IEEE Robotics and Automation Letters},
title={Signage-Aware Exploration in Open World Using Venue Maps},
year={2025},
volume={10},
number={4},
pages={3414-3421},
keywords={Text recognition;Robots;Planning;Semantics;Feature extraction;Navigation;Three-dimensional displays;Shape;Location awareness;Image recognition;Autonomous agents;semantic scene understanding;mapping;planning under uncertainty},
doi={10.1109/LRA.2025.3540390}
}