STAP: Sequencing Task-Agnostic Policies

Abstract

Advances in robotic skill acquisition have made it possible to build general-purpose libraries of learned skills for downstream manipulation tasks. However, naively executing these skills one after the other is unlikely to succeed without accounting for dependencies between actions prevalent in long-horizon plans. We present Sequencing Task-Agnostic Policies (STAP), a scalable framework for training manipulation skills and coordinating their geometric dependencies at planning time to solve long-horizon tasks never seen by any skill during training. Given that Q-functions encode a measure of skill feasibility, we formulate an optimization problem to maximize the joint success of all skills sequenced in a plan, which we estimate by the product of their Q-values. Our experiments indicate that this objective function approximates ground truth plan feasibility and, when used as a planning objective, reduces myopic behavior and thereby promotes long-horizon task success. We further demonstrate how STAP can be used for task and motion planning by estimating the geometric feasibility of skill sequences provided by a task planner. We evaluate our approach in simulation and on a real robot.

STAP-ICRA-2023.mp4

Training with STAP

STAP outlines a scalable pipeline for training robotic skills. Planning with STAP requires learning three components per skill:

Why are STAP skills task-agnostic? Because they aren't trained to solve any particular long-horizon task. They are trained independently and combined at planning time to solve unseen sequential tasks. So, when a novel task calls for a new skill, the skill can simply be added to the library without the need to retrain existing skills.

Planning with STAP

Core idea. Given any sequence of skills, STAP performs sampling-based optimization to obtain actions that maximize the probability of task success, estimated by the product of downstream Q-values.

Planning Domains

STAP is evaluated on 3 domains with varying degrees of geometric complexity and planning horizons.

STAP: In the Real World

hook_reach_greedy_2x.mp4

Hook Reach: Greedy

Greedily executing pick(hook, table) induces the maximum likelihood grasp position - the middle of the handle - causing the next pull(milk, hook) to fail.

hook_reach_policy_cem_2x.mp4

Hook Reach: STAP

Planning with STAP conditions pick(hook, table) on dependent downstream pull(milk, hook) action, resulting in a desirable grasp near the end of the handle.

constrained_packing_greedy_2x.mp4

Constrained Packing: Greedy

Greedily executing pick(obj, table) and place(obj, rack) fails to consider other objects that must also share a spot on the rack, causing the place policy to fail.

constrained_packing_policy_cem_2x.mp4

Constrained Packing: STAP

Planning with STAP gives foresight to upstream place(obj, rack) actions so that earlier placements maximize the area on the rack for later ones.

rearrangement_push_greedy_2x.mp4

Rearrangement Push: Greedy

Greedily executing place(yogurt, table) results in a risk-averse placement position far from other objects, where push(yogurt, rack) cannot be executed.

rearrangement_push_policy_cem_2x.mp4

Rearrangement Push: STAP

Planning with STAP successfully constrains the place(yogurt, table) action space to region in front of the rack where push(yogurt, rack) is feasible. 

Citation

If you found this work interesting, please consider citing:

@inproceedings{AgiaMigimatsuEtAl2023,

  title     = {STAP: Sequencing Task-Agnostic Policies}, 

  author    = {Agia, Christopher and Migimatsu, Toki and Wu, Jiajun and Bohg, Jeannette},

  booktitle = {2023 IEEE International Conference on Robotics and Automation (ICRA)}, 

  year      = {2023},

  pages     = {7951-7958},

  doi       = {10.1109/ICRA48891.2023.10160220}

}

Acknowledgements