STAP: Sequencing Task-Agnostic Policies

Christopher Agia, Toki Migimatsu, Jiajun Wu, Jeannette Bohg

Stanford University

Abstract

Advances in robotic skill acquisition have made it possible to build general-purpose libraries of learned skills for downstream manipulation tasks. However, naively executing these skills one after the other is unlikely to succeed without accounting for dependencies between actions prevalent in long-horizon plans. We present Sequencing Task-Agnostic Policies (STAP), a scalable framework for training manipulation skills and coordinating their geometric dependencies at planning time to solve long-horizon tasks never seen by any skill during training. Given that Q-functions encode a measure of skill feasibility, we formulate an optimization problem to maximize the joint success of all skills sequenced in a plan, which we estimate by the product of their Q-values. Our experiments indicate that this objective function approximates ground truth plan feasibility and, when used as a planning objective, reduces myopic behavior and thereby promotes long-horizon task success. We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner. We evaluate our approach in simulation and on a real robot.

Quick Links

Paper (link): Presented at ICRA 2023

arXiv: https://arxiv.org/abs/2210.12250

Code: https://github.com/agiachris/STAP

Related research:

Text2Motion: From Natural Language Instructions to Feasible Plans

STAP-ICRA-2023.mp4

Training with STAP

STAP outlines a scalable pipeline for training robotic skills. Planning with STAP requires learning three components per skill:

Actors & Critics via off-the-shelf RL algorithms (e.g. SAC, TD3) for controlling manipulation primitives and estimating skill feasibility,
Dynamics Models trained on transition experience of the polices,
Uncertainty Quantification over the Q-networks to detect and out-of-distribution states and actions at plan-time.

Why are STAP skills task-agnostic? Because they aren't trained to solve any particular long-horizon task. They are trained independently and combined at planning time to solve unseen sequential tasks. So, when a novel task calls for a new skill, the skill can simply be added to the library without the need to retrain existing skills.

Planning with STAP

Core idea. Given any sequence of skills, STAP performs sampling-based optimization to obtain actions that maximize the probability of task success, estimated by the product of downstream Q-values.

Search is accelerated in high-dimensional planning spaces by using policies as learned sampling distributions to initialize planning
Robustness to distribution shift with uncertainty quantification allows task-agnostic skills to be combined at planning time

Planning Domains

STAP is evaluated on 3 domains with varying degrees of geometric complexity and planning horizons.

Hook Reach: The robot must use the hook to pull an out-of-reach object inside its workspace
Constrained Packing: Objects must be transported from the table to the rack without collision
Rearrangement Push: Obstacles must be moved to enable pushing the target object under the rack

STAP: In the Real World

hook_reach_greedy_2x.mp4

Hook Reach: Greedy

Greedily executing pick(hook, table) induces the maximum likelihood grasp position - the middle of the handle - causing the next pull(milk, hook) to fail.

hook_reach_policy_cem_2x.mp4

Hook Reach: STAP

Planning with STAP conditions pick(hook, table) on dependent downstream pull(milk, hook) action, resulting in a desirable grasp near the end of the handle.

constrained_packing_greedy_2x.mp4

Constrained Packing: Greedy

Greedily executing pick(obj, table) and place(obj, rack) fails to consider other objects that must also share a spot on the rack, causing the place policy to fail.

constrained_packing_policy_cem_2x.mp4

Constrained Packing: STAP

Planning with STAP gives foresight to upstream place(obj, rack) actions so that earlier placements maximize the area on the rack for later ones.

rearrangement_push_greedy_2x.mp4

Rearrangement Push: Greedy

Greedily executing place(yogurt, table) results in a risk-averse placement position far from other objects, where push(yogurt, rack) cannot be executed.

rearrangement_push_policy_cem_2x.mp4

Rearrangement Push: STAP

Planning with STAP successfully constrains the place(yogurt, table) action space to region in front of the rack where push(yogurt, rack) is feasible.

Citation

If you found this work interesting, please consider citing:

@inproceedings{AgiaMigimatsuEtAl2023,

title = {STAP: Sequencing Task-Agnostic Policies},

author = {Agia, Christopher and Migimatsu, Toki and Wu, Jiajun and Bohg, Jeannette},

booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)},

year = {2023},

pages = {7951-7958},

doi = {10.1109/ICRA48891.2023.10160220}

}

Acknowledgements

Page updated

Report abuse