Hierarchical Policy Blending as Inference for Reactive Robot Control

ACCEPTED @ICRA2023

Kay Hansel*, Julen Urain, Jan Peters and Georgia Chalvatzaki

LinkLinkedInTwitter

Abstract

Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods. 

Supplementary Videos

Point-Mass Navigation In Toy Environments

We study HiPBI's behavior on low-dimensional point-mass navigation tasks. For this purpose, we compare our method with RMPflow and iCEM-MPC. The former corresponds to a reactive policy that ensures a fast response. The latter provides a framework for real-time planning. In both environments, the yellow dot denotes the start, and the green dot the destination. In the toy maze (Fig. 1, top), a certain number of circular obstacles - some static, some dynamic - block the way. In the 2D toy box environment (see Fig. 1. bottom), the challenge is to find the way into the box - from the left or the right - and not to run into local optima. In the images below, the behavior of HiPBI on the point-mass navigation tasks is shown. Check out the paper for more information regarding the comparison!

Fig. 1: 2D Toy environments for planar point-mass navigation.

panda_sim_none.mp4

Fig. 2: RMPflow (baseline) vs HiPBI (our approach)

Robot Manipulation Task In Simulation Environment

We are investigating the performance of HiPBI on a high-dimensional robotics task with a 7DoF manipulator compared against RMPflow. Moving from the orange box to the green box without getting stuck in a local optimum is challenging. Some spherical obstacles, either dynamic or static, complicate the task in the process. In this experiment, we employ the semi-implicit Euler method for joint position control. The acceleration resulting from RMPflow or HiPBI is integrated - taking into account the current velocity and position of the manipulator - to determine the new desired joint position. Check out the paper for more information regarding the comparison!

panda_sim_static_1.mp4

Fig. 2.a: Static environment with one obstacle

panda_sim_static_3.mp4

Fig. 2.b: Static environment with three obstacles

panda_sim_static_5.mp4

Fig. 2.c: Static environment with five obstacles

panda_sim_dynamic_1.mp4

Fig. 2.d: Dynamic environment with one obstacle

panda_sim_dynamic_3.mp4

Fig. 2.e: Dynamic environment with three obstacles

panda_sim_dynamic_5.mp4

Fig. 2.f: Dynamic environment with five obstacles

Supplementary Material

presentation_hipbi_hansel_icra_2023.pptx