Hierarchical Policy Blending As Optimal Transport

An T. Le, Kay Hansel, Jan Peters, Georgia Chalvatzaki

 Abstract

We present hierarchical policy blending as optimal transport (HiPBOT). This hierarchical framework adapts the weights of low-level reactive expert policies, adding a look-ahead planning layer on the parameter space of a product of expert policies and agents. Our high-level planner realizes a policy blending via unbalanced optimal transport, consolidating the scaling of underlying Riemannian motion policies, effectively adjusting their Riemannian matrix, and deciding over the priorities between experts and agents, guaranteeing safety and task success. Our experimental results in a range of application scenarios from low-dimensional navigation to high-dimensional whole-body control showcase the efficacy and efficiency of HiPBOT, which outperforms state-of-the-art baselines that either perform probabilistic inference or define a tree structure of experts, paving the way for new applications of optimal transport to robot control. The implementation will be released later.

I. Tiago++ Whole-Body Control Videos

 The speeds of all videos are not modified and the experiments are recorded as is. We demonstrate HiPBOT versus RMPflow capabilities in the MEMA setting with a high-dimensional, multi-objective and highly dynamic environment, where the TIAGo++ must track two potentially conflicting reference trajectories while avoiding self-collision and an obstacle.

Video 1: Demonstration runs of HiPBOT(h=2). HiBPOT is able to compromise between objectives thanks to its ability to adapt expert priorities online.

Video 2: Demonstration runs of RMPflow. RMPflow struggles to find good situational actions and eventually collides.

II. Other Exemplar Cases 

Video 3: (Left) Demonstration run of HiPBOT on extremely dynamic and dense maze environment. (Right) Demonstration run of HiPBOT on Panda case, with dynamic obstacles hindering the way to the green box.

III. HiPBOT tuning tips