Visually Grounded Task and Motion Planning for Mobile Manipulation

Xiaohan Zhang, Yifeng Zhu, Yan Ding, Yuke Zhu, Peter Stone, Shiqi Zhang

Abstract

Task and motion planning (TAMP) algorithms aim to help robots achieve task-level goals, while maintaining motion-level feasibility. This paper focuses on TAMP domains that involve robot behaviors that take extended periods of time (e.g., long-distance navigation). In this paper, we develop a visual grounding approach to help robots probabilistically evaluate action feasibility, and introduce a TAMP algorithm, called GROP, that optimizes both feasibility and efficiency. We have collected a dataset that includes 96,000 simulated trials of a robot conducting mobile manipulation tasks, and then used the dataset to learn to ground symbolic spatial relationships for action feasibility evaluation. Compared with competitive TAMP baselines, GROP exhibited a higher task-completion rate while maintaining lower or comparable action costs. In addition to these extensive experiments in simulation, GROP is fully implemented and tested on a real robot system.

Overview

An overview of this work, including an FCN-based feasibility evaluation approach, and GROP, our grounded TAMP algorithm. A task corresponds to one "unloading goal'' on the table, as well as a configuration of obstacles (chairs in our case). Given a task, every pixel is considered a navigation goal -- the robot attempts to navigate there, and unload an object from there. This navigation-manipulation process is referred to as a trial. The robot performs multiple trials for each navigation goal, which yields a feasibility value for that particular location. The feasibility values together form one heatmap for each task. In our dataset, each instance is a top-down view image, whose label is the corresponding heatmap. The "Dataset" box shows a few "combined heatmaps" where heatmaps are overlaid onto the corresponding images. Training with the dataset generates an FCN that is used for two purposes: 1) evaluating the feasibility of task-level actions, and 2) selecting motion-level navigation goals. Finally, GROP incorporates both efficiency (measured by action costs) and feasibility to compute task-motion plans for a mobile manipulator.

Experiments

Overall performances of GROP and four baseline methods in efficiency (x-axis) and task completion rate (y-axis). Tasks are grouped based on their difficulties. The ellipses represent the means and 2D standard variances of each approach. GROP produced the highest task completion rate, while maintaining smaller or comparable execution time. This observation is consistent over tasks of different difficulties.

Simulation and Real Robot Demonstrations

GROP_demo_2.mp4

GROP_demo_3.mp4

More Sim Experiments

grop_gifs.mp4

Page updated

Google Sites

Report abuse