ICRA 2021: Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning

Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning

Claudia Pérez-D’Arpino, Can Liu, Patrick Goebel, Roberto Martín-Martín, Silvio Savarese

Stanford University

Abstract: Navigating fluently around pedestrians is a necessary capability for mobile robots deployed in human environments, such as buildings and homes. While research on social navigation has focused mainly on the scalability with the number of pedestrians in open spaces, typical indoor environments present the additional challenge of constrained spaces such as corridors and doorways that limit maneuverability and influence patterns of pedestrian interaction. We present an approach based on reinforcement learning (RL) to learn policies capable of dynamic adaptation to the presence of moving pedestrians while navigating between desired locations in constrained environments. The policy network receives guidance from a motion planner that provides waypoints to follow a globally planned trajectory, whereas RL handles the local interactions. We explore a compositional principle for multi-layout training and find that policies trained in a small set of geometrically simple layouts successfully generalize to more complex unseen layouts that exhibit composition of the structural elements available during training. Going beyond walls-world like domains, we show transfer of the learned policy to unseen 3D reconstructions of two real environments. These results support the applicability of the compositional principle to navigation in real-world buildings and indicate promising usage of multi-agent simulation within reconstructed environments for tasks that involve interaction.

ICRA 2021. arXiv:2010.08600

Robot Learning for Social Navigation

Have you seen a mobile robot in your local supermarket? Or perhaps navigating in the sidewalk while doing a delivery? Some of these robots are being remotely controlled and others are using technologies to navigate autonomously. You might see them stopping or indecisive about their next motion. This is because doing tasks with and around people presents many challenges to our current robotic technologies, such as sensing people and adapting their motion accordingly in a social manner. Robots that can participate in collaborative activities with humans will enable many applications for robots that help in our daily lives at home, offices, and stores, and might potentially change the way we do many activities.

Robot mobility around people is a key component of a system that can be deployed in human unstructured environments. The main challenge of social navigation is to generate robot motions and behaviors on-line that comply with social patterns, enabling the robot to share the space without disrupting human activities. Social navigation has been typically studied in open spaces, such as an open street or outdoor public spaces. However, most human indoor spaces exhibit constrained layouts and typical elements, such as corridors, doors and intersections. Constrained environments limit the maneuverability of the robot, forming a special case for the social navigation problem. These typical elements are relevant not only because of their geometry, but also due to the characteristic navigation movements and interactions that emerge from the layouts. In this work, we developed an approach for social navigation in indoor environments that can handle navigation in the constraints spaces, and that takes advantage of these elements to perform training in simplified environments that generalizes to more complex layouts representative of real human spaces.

Learning tasks with human-robot interactions in simulation

Recent advances in robot learning take advantage of simulation as a medium for the robot to explore the environment and actions without the limitations of real-world testing (time, cost, deployment logistics, safety, etc). Most of these advances are demonstrated for single-robot learning. It is less clear how to take advantage of simulation for tasks that involve multi-agent interactions, either between learned models or for human-robot collaborative tasks. At least two main key challenges appear in this setting: having a human model generating simulated behavior, and modelling of the patterns of interaction in a way that follows the human dynamics. These are fundamental problems of learning for interactive tasks, with various requirements for the level of accuracy of these simulated models depending on the properties of the domain and the observation space of the policies.

In the social navigation set up, a first step towards this direction is to design a multi-agent simulation with pedestrians that follow principles from a social model, using a close-form solution for computing the motion of all pedestrians given some start and goal locations within the environment. The pedestrian model takes into consideration the presence of other pedestrians and the robot, exhibiting basic interactive behaviors that cause both the robot and pedestrians to collaborate in maneuvering around each other. We implemented a multi-agent simulation with moving pedestrians using ORCA to drive their motion, and use realistic human environments obtained from 3D reconstructions of indoor spaces from the Gibson dataset. This approach offers a platform for robot learning in indoor realistic scenes with interactions.

cut_intro.mp4

Multi-agent simulation with pedestrians in real indoor human environments

We created a multi-agent simulation environment for the general problem of navigation around pedestrians, in which a simulated robot agent can collect experience and train according to a POMDP formulation. We use the Interactive Gibson Simulator that runs on top of the pyBullet physics engine.

This video shows the robot navigating within an apartment, represented in the simulator as a mesh obtained from a real 3D scan. The robot is using the resulting learned policy, which combines RL and motion planning.

Note: if you have trouble playing the videos you might need to enable cookies in your browser.

Combining planning and deep RL for social navigation

We present an approach based on reinforcement learning (RL) to learn policies capable of dynamic adaptation to the presence of moving pedestrians while navigating between desired locations in constrained environments. The policy network receives guidance from a motion planner that provides waypoints to follow a globally planned trajectory, whereas RL handles the local interactions.

Using RL along in an end-to-end manner results in a training sample complexity that scales with the complexity of the geometry of the layout and often resulted in low success rate. This increase in training time is not justified for a problem that is local in nature: the robot must solve for short-term local interactions to maneuver around pedestrians. While these motions must still consider the end goal of the current trajectory, the full complexity of the entire layout is out of scope for this rapid and short interaction problem. Our approach takes advantage of this property by assigning long trajectory planning from start to goal position to a motion planner, which returns desired way-points. In the absence of pedestrians, strictly following these way-points would take the robot to the goal. However, when moving pedestrians are in the scene, the robot uses a reactive learned policy to adapt to their presence and motion. This is the component that is learned during training. The figure bellow shows the policy network, which we train using the off-policy SAC algorithm in continuous space.

The observation space, O, includes the elements o = {goal,lidar,waypoints}, where goal is the episodic navigation goal, represented by the 2D coordinates in robot’s reference frame, lidar contains the 128 range measurements from a 1D LiDAR sensor in robot sensor reference frame, and waypoints contains a n = 6 waypoints computed by a global planner with access to a map of the environment. The action space, A , defines actions a = {vx, vy,ω}, where (vx, vy) is the commanded linear velocity, and ω is the commanded angular velocity for the mobile robot. Details of the reward function are provided in the paper.

Learning in simple layouts (walls-worlds) generalizes to complex geometries (complex walls-worlds layouts) and to realistic indoor human environments (3D reconstructions from real spaces).

We analyze the capability of this approach to generalize from simple to complex layouts with multiple composed and combined navigation challenges not seen during training. We train the robot on a small set of simple layouts (e.g. the three layouts grouped in blue). We define simple layouts as walls-worlds: made of straight walls with basic geometries. These layouts were designed to capture the essential geometric properties of many indoor environments such as corridors, crossing hallways and office spaces with doorways. Many indoor spaces can be viewed as a composition of these simpler components.

Then we test two types of generalization (illustrated below): (1) walls-worlds with more complex layout, and to (2) 3D reconstructions of real indoor human environments, such as an apartment and a supermarket.

We find experimentally that policies trained on a set simple layouts generalize better when those represent the elements of the geometries and interactions that conform to the target layout. This insight results particularly relevant for the built environment, which is typically composed by combinations of basic layouts templates, such as hallways, rooms, door exits, and crosswalks. Below we show the resulting success rate (robot reaches the goal location without collisions) over 100 test trials in the target environment "H". Training in layouts with a corridor, intersection and door exit (T1, highlighted in green) achieved 96% success rate, while training in a different combination (T2, highlighted in blue) with geometries less relevant to the layout in "H" resulted in much lower performance of 78%.

cut_apt_2.mp4

Generalization to 3D reconstructed environment

Policy trained in simple layouts WALLS (T1) and tested in MESH-HOME, a 3D reconstruction of a real apartment previously unseen by the robot.

The video show interesting learned behaviors such as stopping, waiting and backing up, when the space is limited for both pedestrians and robot to move.

Comparison with motion planning

Our proposed approach is based on learning a policy that has access to way-points generated by a motion planner. We compare this approach to using a motion planner only by deploying the ROS-NAV-STACK in our domain. The planner in the ROS-NAV-STACK is a layered costmap method that also uses two components: a global planner using Dijkstra’s algorithm and a local controller that uses the dynamic window approach. The ROS navigation parameters were tuned to optimize the performance of the planner in the WALLS layouts. The results show benefits of using the learning and planning combination over planning only, specially due to the planner tendency to fall into the robot freeze problem. Nonetheless, it is very illustrative to see the solutions of the planner to inform interesting research directions for social navigation. Below are a few examples.

For each video below, the left layout shows the pyBullet simulation with the robot in light color (I suggest to amplify the video size). The right layout shows the planning states, including the costmap and the current planned trajectory in green. These cases illustrate how the planner changes the desired trajectory according to the motion of the pedestrians, detected as moving parts in the lidar.

w_planner_b6_6p_straightcrowded.mp4

(Planner) Straight path, 6 Pedestrians: This simple case shows how the planner alternates between passing pedestrians using either side of the corridor, and the high density of pedestrians makes the robot wait when approaching the goal.

w_planner_b6_6p_exampleinteraction.mp4

(Planner) In this example, the robot achieves the goal but in abruptly moves in front of the pedestrians after exiting the small corridor.

w_planner_b6_16p_straight.mp4

(Planner) Straight path, 6 Pedestrians: Similar to the example above with more pedestrians in the corridor.

w_planner_b6_16p_crowdedinteraction.mp4

(Planner) Example of interactions in a very crowded space.

w_planner_b6_6p_collision.mp4

(Planner) This episode shows a failure by collision. In this particular case, the plan seemed geometrically correct but it doesn't take into account the possible future motion of one pedestrian.

w_planner_b6_6p_frozen_pedsimblocking.mp4

(Planner) This example shows an instance of the robot-freezing problem, in which the robot continuously fails to find a feasible solution within its safety margin and keeps re-planning. It also shows the need for improvements in pedestrians simulators to generate more realistic behaviors in this type of corner cases.

Comparing RL+planner with planning

We compared our approach with the planner in the ROS-NAV-STACK by increasing the number of pedestrians in the scene. Both methods decrease performance as the density goes up but due to different problems. The planner tend to time-out due to the freezing-robot effect. The trained policy has a small number of collisions. Unlike the planner problem, in which the robot doesn't move, a small probability of collisions can be improved with a lidar-based safety stop. This is a safety requirement for real deployment in any case.

WALL_16pedestrians.mp4

Learned policy (this paper) around 16 pedestrians

This videos show the robot navigating within an unseen layout with 16 ORCA pedestrians, using a policy learned in the simple layouts T1 (shown in image above) with less than 4 pedestrians.

In the second half of this talk I cover more of the topic of learning in simulation for interactive tasks and our approach for social navigation:

Open Research Questions and Next Steps

JR_follow_me_Gates.mp4

While we show all our results in simulation, the final aim is to deploy this approach in real robots (when the current restrictions allow). Our target platform is Stanford's Jackrabbot, a differential drive mobile manipulator equipped with stereo vision and lidars. The pybullet simulation in this paper uses this robot's model including the non-holonomic constraints. We have verified the capability of transferring policies from simulation to the real platform for policies that act in linear and angular velocities passed to a low-level on-board controller, and obtained smooth navigation behaviors as shown in this video. In terms of sensing, the policy uses lidar sensing which is accurately simulated in iGibson.

This video shows Jackrabbot's navigation while following a person using on-board sensing. It showcases smooth navigation in an indoor human environment. Future work aims at augmenting this platform with the proposed social navigation framework.

On the modelling and algorithmic side, we see several promising key lines of research. One aspect centers on the simulation itself, as better pedestrians simulation and the availability of realistic scenes can improve the training environments to reflect better the cases the robot will encounter in the real world. This point circles back to the initial point regarding the use of simulation for human-robot interaction domains. Realistic pedestrian simulation, including behaviors that adapt to the semantic context, individual preferences, adaptation to robots over time, among others, are current research open questions and methods should account for this simulation-to-reality gap. Regarding the policy architecture, one important improvement is the modelling and prediction of pedestrian trajectories. Several works in social navigation in open spaces include these predictions, and future steps must adapt pedestrian prediction and interaction modelling to these challenging indoor spaces.

Moving towards vision-based social navigation

We strive to continue improving the simulation and moving into new research challenges. While this paper uses a lidar-based policy, full deployment will likely also require vision techniques that enable detection of pedestrians and scene understanding for example. Together with Google Robotics, we launched the Interactive and Social Navigation Challenge using the iGibson simulator in the context of the 2021 CVPR Embodied AI workshop. This challenge offers an improved version of the simulation for the social navigation task in which we enabled the agent to observe high-quality rendered RGB images within realistic indoor environments. This is a step towards vision-based social navigation and an effort towards defining and standardizing evaluation for this domain.

Pedestrians in the scene

The environments include simulated pedestrians using the ORCA model as described above. Pedestrians are fixed meshes with realistic visual appearance.

iGibson environments

The challenge dataset counts with 8 realistic scenes modeled after real apartments. Pedestrians are automatically simulated, providing a fully integrated environment for robot learning using RL frameworks.

We invite researchers to participate in the challenge, and contribute to the understanding of the state-of-the-art solutions and determining the key research challenges ahead for social navigation.

This page was written by Claudia Pérez-D'Arpino. If you find this paper useful for your research, cite:

Claudia Pérez-D'Arpino, Can Liu, Patrick Goebel, Roberto Martín-Martín and Silvio Savarese. Robot Navigation in Constrained Pedestrian Environments using Reinforcement Learning. 2021 IEEE International Conference on Robotics and Automation (ICRA 2021). arXiv:2010.08600