Yohei Hayamizu*, David DeFazio*, Hrudayangam Mehta*, Zainab Altaweel, Jacqueline Choe,
Chao Lin, Jake Juettner, Furui Xiao, Jeremy Blackburn, and Shiqi Zhang
The State University of New York at Binghamton
*: Equal Contribution
Assistive robotics is an important subarea of robotics that focuses on the well-being of people with disabilities. A robotic guide dog is an assistive quadruped robot for assisting visually impaired people in obstacle avoidance and navigation. Enabling language capabilities on robotic guide dogs goes beyond naively adding an existing dialog system onto a mobile robot. The novel challenges include grounding language to the dynamically changing environment and improving spatial awareness for the human handler. To address those challenges, we develop a novel dialog system for robotic guide dogs that uses large language models to verbalize both navigational plans and scenes. The goal is to enable verbal communication for collaborative decision-making within the handler-robot team. In experiments, we performed a human study to evaluate different verbalization strategies, and a simulation study to evaluate the efficiency and accuracy in navigation tasks.
Overview of our system. Human-robot dialog is conducted to generate a formal task which resolves the human’s service request. The LLM determines the relevant navigable locations which can fulfill the request. Then, a task planner generates multiple action sequences (i.e., plans), one for each goal location candidate suggested by the LLM. A summary of each plan is verbalized to the user, consisting of the number of door opening actions and the total navigation cost. Finally, after a formal task is identified via human-robot dialog, the robot takes actions corresponding to the selected plan to guide the human to the desired destinations.
Our approach provides the best trade-off between high accuracy and efficient conversation.
Our approach remains highly accurate, while the keyword-based method fails.
Accuracy and average dialog length (number of words) per each approach. Horizontal bars through each point indicate the standard deviation.
Multi-Turn (ours) produced much higher accuracy than the Single-Turn baseline, while significantly improving the dialog efficiency compared with the Keyword-Based approach.
Conclusion
There are hundreds of millions of visually impaired people, and only a very small portion of them use guide dogs. We develop a robotic guide dog system that leverages human-robot dialog to extract formal tasks from ambiguous, open-vocabulary service requests. These tasks are then accomplished on the robot using a task planner. Simulation results evaluate the accuracy and average dialog length of our system, along with its robustness to perturbed inputs and the usefulness of plan information injected into the dialog. Hardware demonstrations illustrate our system’s potential for real-world visually impaired navigation. In the future, we plan to enhance the verbalization component to improve the context awareness of the visually impaired, and to extend to complex tasks that require interacting with the environment (e.g., open doors and press buttons) and long-horizon planning.
Acknowledgment
The authors thank Thomas Panek, Bill Ma, and Ken Fernald for their guidance in this research, and the participants for their valuable feedback. A portion of this work has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by the NSF (IIS-2428998, NRI-1925044), Ford Motor Company, DEEP Robotics, OPPO, Guiding Eyes for the Blind, and SUNY RF.