STAP: Sequencing Task-Agnostic Policies
Christopher Agia*, Toki Migimatsu*, Jiajun Wu, Jeannette Bohg
Stanford University
Abstract
Advances in robotic skill acquisition have made it possible to build general-purpose libraries of learned skills for downstream manipulation tasks. However, naively executing these skills one after the other is unlikely to succeed without accounting for dependencies between actions prevalent in long-horizon plans. We present Sequencing Task-Agnostic Policies (STAP), a scalable framework for training manipulation skills and coordinating their geometric dependencies at planning time to solve long-horizon tasks never seen by any skill during training. Given that Q-functions encode a measure of skill feasibility, we formulate an optimization problem to maximize the joint success of all skills sequenced in a plan, which we estimate by the product of their Q-values. Our experiments indicate that this objective function approximates ground truth plan feasibility and, when used as a planning objective, reduces myopic behavior and thereby promotes long-horizon task success. We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner. We evaluate our approach in simulation and on a real robot.
Quick Links
Paper (link): Presented at ICRA 2023
arXiv: https://arxiv.org/abs/2210.12250
Code: https://github.com/agiachris/STAP
Related research:
Training with STAP
STAP outlines a scalable pipeline for training robotic skills. Planning with STAP requires learning three components per skill:
Actors & Critics via off-the-shelf RL algorithms (e.g. SAC, TD3) for controlling manipulation primitives and estimating skill feasibility,
Dynamics Models trained on transition experience of the polices,
Uncertainty Quantification over the Q-networks to detect and out-of-distribution states and actions at plan-time.
Why are STAP skills task-agnostic? Because they aren't trained to solve any particular long-horizon task. They are trained independently and combined at planning time to solve unseen sequential tasks. So, when a novel task calls for a new skill, the skill can simply be added to the library without the need to retrain existing skills.
Planning with STAP
Core idea. Given any sequence of skills, STAP performs sampling-based optimization to obtain actions that maximize the probability of task success, estimated by the product of downstream Q-values.
Search is accelerated in high-dimensional planning spaces by using policies as learned sampling distributions to initialize planning
Robustness to distribution shift with uncertainty quantification allows task-agnostic skills to be combined at planning time
Planning Domains
STAP is evaluated on 3 domains with varying degrees of geometric complexity and planning horizons.
Hook Reach: The robot must use the hook to pull an out-of-reach object inside its workspace
Constrained Packing: Objects must be transported from the table to the rack without collision
Rearrangement Push: Obstacles must be moved to enable pushing the target object under the rack
STAP: In the Real World
Hook Reach: Greedy
Greedily executing pick(hook, table) induces the maximum likelihood grasp position - the middle of the handle - causing the next pull(milk, hook) to fail.
Hook Reach: STAP
Planning with STAP conditions pick(hook, table) on dependent downstream pull(milk, hook) action, resulting in a desirable grasp near the end of the handle.
Constrained Packing: Greedy
Greedily executing pick(obj, table) and place(obj, rack) fails to consider other objects that must also share a spot on the rack, causing the place policy to fail.
Constrained Packing: STAP
Planning with STAP gives foresight to upstream place(obj, rack) actions so that earlier placements maximize the area on the rack for later ones.
Rearrangement Push: Greedy
Greedily executing place(yogurt, table) results in a risk-averse placement position far from other objects, where push(yogurt, rack) cannot be executed.
Rearrangement Push: STAP
Planning with STAP successfully constrains the place(yogurt, table) action space to region in front of the rack where push(yogurt, rack) is feasible.
Citation
If you found this work interesting, please consider citing:
@inproceedings{AgiaMigimatsuEtAl2023,
title = {STAP: Sequencing Task-Agnostic Policies},
author = {Agia, Christopher and Migimatsu, Toki and Wu, Jiajun and Bohg, Jeannette},
booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)},
year = {2023},
pages = {7951-7958},
doi = {10.1109/ICRA48891.2023.10160220}
}
Acknowledgements