Unguided Self-exploration in Narrow Spaces with Safety Region Enhanced Reinforcement Learning for Car-like Robots

"Narrow space motion planning for car-like robots via learning" Code | Video | Paper --by Zhaofeng Tian

Tian, Zhaofeng, et al. "Unguided Self-exploration in Narrow Spaces with Safety Region Enhanced Reinforcement Learning for Ackermann-steering Robots." 2024 IEEE International Conference on Mobility, Operations, Services and Technologies (MOST). IEEE, 2024.

zhaofeng@udel.edu

Safety Region state representation for rectangular-shaped robots is proposed.
A "FOMT" reward function without waypoint guidance is proposed.

This blog provides more complementary details for the paper.

First, let's see some experimental results to give the audience a more straightforward intuition.

Insufficiency to pass a 90 and 180-degree corner. DWA, TEB, IL, Waypoint-guided RL(left to right).

Our trained model with the "FOMT" reward function could easily handle the corner turn with backing-up(reverse) skills to adjust the heading direction and distance to the obstacle ahead.

While in the ablation studies, we find the "FT" model trained without the "O" component could fail in transferred turns (haven't learned a stable turning skill), and the "FOT" model without "M" could fail in unseen curvy and uneven tracks (no idea how to tackle the untrained curvy or uneven features), which is shown below. (FT, FOT)

With whole "FOMT" model, obstacle avoidance and keep-in-middle skills can be learned, see below.

Safety Region for Rectangular-shaped Robots

How to detect collisions without a map and dilation/obstacle parameterization for a rectangular-shaped car-like robot is challenging, especially in narrow spaces, and missing detections could impact the robot to learn a collision boundary and then lead to failure in model applications. Different from the previous lidar-based Fixed Interval Fixed Range (FIFR) representation for round and differential robots, our Safety Region (SR) representation discretizes laser scans to evenly distribute on each side of the rectangular so that increase the ability to detect collisions while avoiding over-coverage introduced by FIFR paradigm. Also, Fixed Interval Rectangular-fitting (FIRect) is used to do an ablation from our SR representation.

By conducting random collisions in the narrow space, we compare the number of detected collisions, and our SR method could detect more collisions so that this representation could provide more help for robots to learn collision avoidance. The experiments in Gazebo and quantitative results are shown below.

Reward Function Shaping

We formulate the problem as a Reinforcement Learning (RL) based mapless self-exploration problem in narrow spaces, which means the robot does not have a map to plan its motion as well as a goal point to achieve.

The main goal of the robot is to explore as much space as possible. In the previous waypoint-guided RL research, the difference between the current location to the goal location is taken into the reward function, wherein the less the difference, the higher the reward. Whereas our reward function does not use waypoint or destination guidance.

FOMT reward design: To facilitate the robot moving forward instead of backing up very often, we use the forward Lidar distance times the robot's linear speed to be Rf , the "F" component of the reward function, whereby going towards more open space will have more reward (forward Lidar distance larger).

Ro, the "O" component applies the log function on several closest Lidar distances to punish getting close to the obstacle. Also, Rm , the "M" component is designed to keep the robot in the middle by minimizing the difference between right and left Lidar measurements. Rt , the "T" component with a constant value is to punish time passing.

Training in Simulated Narrow Track

As shown in the picture, the robot is trained in the simulation environment using the dedicated Gazebo simulation model for our ZebraT robot platform[1]. The training track has 45, 90, 180 degree corners, which form challenging conditions for the robot to pass. The trained robot is expected to reach the exit by self-exploration and avoid any collisions with the walls.

Five Algorithms are benchmarked in the training process, they are DDPG, DQN, PPO, PPO-discrete, and SAC. Since the DDPG- trained model could perform better especially could learn how to steer at a narrow corner with backing-up skills, we select the DDPG model for our subsequent training (ablation sets and waypoint-guided RL). Different Contrastive and Ablation study sets are set up as below.

Sim-to-sim Transfer Test

We then transfer those trained models to both seen and unseen tracks for extensive evaluations. All methods are benchmarked as below. While the ablation sets comparisons are listed on the right side. Some visualized results can be seen at the top of the blog.

Sim-to-real Transfer Test

Cardboards are used to simulate narrow tracks, and the trained model from the original simulation track is tested with our robot platform ZebraT, an Ackermann-steering robot with a 16-line Lidar mounted.

As a result, the original trained FOMT model performed well and passed all three challenging tracks without any collisions. Which demonstrates the generalizing power of gained policy, and validates our experiment design and whole workflow.

Reference

"Design and Implement an Enhanced Simulator for Autonomous Delivery Robot" by Z.Tian, W.Shi

Page updated

Google Sites

Report abuse