DRAGON: A Dialogue-Based Robot for
Assistive Navigation with Visual Language Grounding
Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachery Mizrachi,
Justin Lin, D. Livingston McPherson, Wendy A. Rogers, and Katherine Driggs-Campbell
University of Illinois Urbana-Champaign
In IEEE Robotics and Automation Letters (RA-L)
Full demonstration video from our user study
(Landmark recognition + Assistive navigation to a sofa and a door)
Abstract
Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
System Overview
Robot platform: Turtlebot 2i with additional sensors and a handle (right figure below).
Communication modules: Speech-to-text and text-to-speech via a headset.
Visual language grounding modules:
Landmark recognition: use a finetuned CLIP (Radford et al. 2021) to match language commands to image goals on a map (left figure below).
Environment description: An object detector (Zhou et al. 2022).
Visual question answering (VQA): A finetuned VQA model (Kim et al. 2021).
SLAM and Navigation: ROS navigation stack.
The environment map with semantic landmarks
The TurtleBot platform
More videos
A playlist with all videos can be found at https://www.youtube.com/playlist?list=PLL4IPhbfiY3YkITpyLjeroak_wBn151pn.
Destination: TV
Destination: Hallway
(Failure case: the system cannot interpret
directions such as left and right)
Destination: Workstation with a desk and a chair
Destination: Workstation with a desk and a chair
Destination: Dining chair
(Failure case: the system counts two overlapping chairs as one)
Destination: Door
Destination: Sofa
(Failure case: the system encounters an out-of-distribution command, and thus the intent understanding is wrong.)
Citation
@article{liu2024DRAGON,
title={{DRAGON}: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding},
author={Liu, Shuijing and Hasan, Aamir and Hong, Kaiwen and Wang, Ruxuan and Chang, Peixin and Mizarchi, Zachary and Lin, Justin and McPherson, D. Livingston and Rogers, Wendy A. and Driggs-Campbell, Katherine},
journal={IEEE Robotics and Automation Letters},
year={2024},
volume={9},
number={4},
pages={3712-3719}
}