Generalized Policy Iteration using Tensor Approximation for Hybrid Control

Abstract:

Optimal Control of dynamic systems involving hybrid actions is a challenging task in robotics. To address this, we present a novel algorithm called Generalized Policy Iteration using Tensor Train (TTPI) that belongs to the class of Approximate Dynamic Programming (ADP). We use a low-rank tensor approximation technique called Tensor Train (TT) to approximate the state-value and advantage function which enables us to efficiently handle hybrid action space. We demonstrate the superiority of our approach over previous baselines for some benchmark problems with hybrid action spaces. Additionally, the robustness and generalization of the policy for hybrid systems are showcased through a real-world robotics experiment involving a non-prehensile manipulation task.

Paper: Open Review

Background Material on TT-Cross and TTGO: TTGO

Experiments

Pendulum Swing-up (Under-Actuated)

Cart Pole Swing-up (Under-Actuated)

Comparisons with RL on Benchmark Hybrid Control Problems

TTPI outperforms the state-of-art RL algorithms for hybrid control. While RL algorithms success rate was 94% for Catch-Point problem and less than 10% for Hard-Move problem for M>12 (i.e. larger than 24 dimensional action space) with 4hr to 10hr training, TTPI achieved 100% success rate in all the problems with less than 0.25hr training.

Non-Prehensile Planar Pushing Experiment:

We demonstrate the effectiveness of our proposed method for hybrid system control on a planar pushing task with a face-switching mechanism that involves discrete states and actions. The objective is to push a block with freedom in switching both the contact modes and faces. It is modeled using 6 states and 4 actions. The action includes a discrete variable representing the index of next contact face. Its underactuated and hybrid nature, coupled with multiple discrete contact modes, makes it difficult to design effective control strategies, and it has been a test-bed problem for the control of hybrid systems. Previous approaches, such as mixed integer programming and hybrid Differential Dynamic Programming have struggled with the high computational cost required for solving the problem, which requires robust algorithms that can handle the complexity of hybrid systems with both continuous and discrete variables. Note that typically such a non-prehensile manipulation problem is formulated differently as continuous control due to a lack of methodologies to handle hybrid actions and is not representative of hybrid control in robotics applications.

TTPI achieves robust performance with 100% success rate (reaching the goal) in both simulation and real-world experiments for this task. The experiments demonstrate successful reaching of the target position and orientation, even in the presence of additional weight and external disturbances indicating the potential of TTPI for solving complex hybrid system control.

Page updated

Google Sites

Report abuse