Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Teng Xue, Amirreza Razmjoo, Suhan Shetty, and Sylvain Calinon

Robotics: Science and Systems (RSS), 2024

Abstract

Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of tensor train factorization to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.

Motivation

Given the initial and target configuration, we want to find the shortest path on the manifolds.

Method

Logic-Skill Programming:

We introduce first-order logic into sequential skill planning, resulting in a first-order extension of the mathematical program.

Skill Learning and Value Function Approximation:

We applied Generalized Policy Iteration using Tensor Train (TTPI) to learn the skill policies and approximate the value functions, leading to a skill library. The skills are learned independently, therefore making it easy to augment the library in a life-long manner.

Solver for Logic-Skill Programming:

We propose to solve Logic-Skill Programming by alternating between symbolic search and value optimization.

Simulation demo

We validate the proposed method on three sequential manipulation tasks:

1) Prehensile Manipulation: This task involves objects that can be directly grasped. The goal is to alter the 6D pose of a block placed on the table, beyond the robot’s reachability. To achieve this, the robot must pick up the block in the end. This requires the robot to figure out using a hook to extend the kinematic chain and pull the block back into the reachability region, and then grasp it.

2) Partly-Prehensile Manipulation: This task involves objects that can only be grasped in specific directions. The goal in this domain is to manipulate the 6D pose of the cube, including the Z direction. Therefore, the robot must strategically decide how to pick up the object in the end.

3) Non-Prehensile Manipulation: This task involves objects that cannot be grasped. The objective is to manipulate the box within the 3D world by leveraging multiple non-prehensile planar manipulation primitives and establishing contacts with the surroundings to achieve the final 6D pose. This task can also be seen as a special case of in-hand manipulation, where the robot and the wall act as active and passive ”fingers”, respectively.

Supplementary video

Contact: teng.xue@idiap.ch

Feel free to contact if you have any questions!

Page updated

Google Sites

Report abuse