Hierarchical RL for Navigation Among Movable Obstacles

I spent a lot of time as a PhD student working on getting robots to move around successfully in cluttered spaces, like homes and offices. This page gives an overview of a hierarchical reinforcement-learning model for solving this problem for cases in which the robot's actions have uncertain effects.

Hierarchical Decision Theoretic Planning for Navigation Among Movable Obstacles (WAFR 2012)

Abstract

In this paper we present the first decision theoretic planner for the problem of Navigation Among Movable Obstacles (NAMO). While efficient planners for NAMO exist, they suffer in practice from the inherent uncertainties in both perception and control on real robots. Generalizing the ideas of these planners to the nondeterministic case has proven difficult, due to the sensitivity of MDP methods to task dimensionality. Our work addresses this challenge by combining ideas from Hierarchical Reinforcement Learning with Monte Carlo Tree Search, and results in an algorithm that can be used for fast online planning in uncertain environments. We evaluate our algorithm in simulation, and provide a theoretical argument for our results which suggest linear time complexity in the number of obstacles for non-degenerate environments.

The gist of this paper was to automatically generate a high-level planning problem based on the abstraction "free-space regions = high-level states". The figure below depicts the abstract MDP generated by detecting the free-space regions in the map at left. In this work the state-space was discrete, and we used a simple flood-fill algorithm for detecting

these regions. See here for the continuous state-action follow-up.

Free-Space regions and the corresponding MDP.

Here's the same algorithm applied to a larger map. The high-level policy is annotated on each free-space state, and red arrows illustrate the low-level policy over axis-aligned pushing actions computed by MCTS.

A larger example MDP

We observed a nice linear trend in runtime vs. the number of obstacles, and the paper included a cute proof for this effect which exploited the planar-graph nature of most maps.

Runtime Graph

Planning with Movable Obstacles in Continuous Environments with Uncertain Dynamics (ICRA)

Abstract

In this paper we present a decision theoretic planner for the problem of Navigation Among Movable Obstacles (NAMO) operating under conditions faced by real robotic systems. While planners for the NAMO domain exist, they typically assume a deterministic environment or rely on discretization of the % a coarse a priori configuration and action spaces, preventing their use in practice. In contrast, we propose a planner that operates in real-world conditions such as uncertainty about the parameters of workspace objects and continuous configuration and action (control) spaces. To achieve robust NAMO planning despite these conditions, we introduce a novel integration of Monte Carlo simulation with an abstract MDP construction. We present theoretical and empirical arguments for time complexity linear in the number of obstacles as well as a detailed implementation and examples from a dynamic simulation environment.

The general idea with this paper was to push the "free-space MDP" idea through to a realistic state and action representation for a mobile manipulator. We used the disconnected components of a probabilistic roadmap (PRM) as the free-space region detector, and ran a kino-dynamic RRT for the low-level manipulation planner instead of MCTS. The entire algorithm was built on top of the Box2D physics engine, and we captured transition uncertainty as distributions over model parameters rather than over the transitions directly as in the WAFR paper. The KD-RRT was therefore monte-carlo in two senses: first, in the regular RRT sense of sampling random regions to grow towards, and second in sampling random dynamics models to use for node expansion. These ideas were the precursor to the Physics-Based RL project.

Continuous

The videos below show our planner in action.