SELFI: Autonomous Self-improvement with Reinforcement Learning for Social Navigation
Noriaki Hirose1,2, Dhruv Shah1, Kyle Stachowicz1, Ajay Sridhar1, and Sergey Levine1
1: University of California, Berkeley, 2: Toyota Motor North America
Paper is here! [arXiv]
Abstract
We propose an online reinforcement learning approach, SELFI, to fine-tune a control policy trained on model-based learning. In SELFI, we combine the best parts of data efficient model-based learning with flexible model-free reinforcement learning, alleviating both of their limitations. We formulate a combined objective: the objective of the model-based learning and the learned Q-value from model-free reinforcement learning. By maximizing this combined objective in the online learning process, we improve the performance of the pre-trained policy in a stable manner. Main takeaways from our method are
Quick online fine-tuning the model-based control policy
Stable online reinforcement learning with less human intervention
Evaluation with SACSoN
In order to evaluate SELFI using a real robot, we implement SELFI for a vision-based social navigation task, where a robot must navigate an indoor environment with pedestrians. We employ SACSoN as the offline model-based method. In the online phase, SELFI fine-tunes the pre-trained SACSoN policy and attempts to learn the following robotic behaviors:
pre-emptive avoidance of oncoming pedestrians,
collision avoidance for the small or transparent objects, and
avoiding travel on uneven floor surfaces.
Here, we visualize the performance of the fine-tuned contol policy by SELFI.
Socially compliant robot behavior
We show the video in the human occupied spaces with many pedestrians. SELFI attemps to obtrusively navigate in the real environments. Due to the robot speed limitation, the robot occasionally penetrate the personal space in some cases. But, the robot is naturarlly moving toward the goal position without a lot of personal space violation. We evaluate our method with the baseline methods in the experiments on the organized scene, which will have similar interactions in each method. And, we conduct the experiments with human subjects.
![](https://www.google.com/images/icons/product/drive-32.png)
SELFI (fine-tuned SACSoN by our method)
![](https://www.google.com/images/icons/product/drive-32.png)
Baseline
![](https://www.google.com/images/icons/product/drive-32.png)
Pre-trained SACSoN policy
Collision avoidance for small obstacles
Although SACSoN contains the objective for collision avoidance, it doe not sufficiently work for the small obstacles, due to inaccurate geometric model. SELFI enables us to learn the collision avoidance behavior for the small objects by giving the negative reward in model-free reinforcement learning. More videos to evaluate our method in the same scenes are shown here.
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Avoidance traveling on uneven floor surface
And, SELFI enables us to avoid traveling on uneven floor surfaces. It is difficult for offline learning to learn, because the offline dataset does not contain the sufficient information about the floor surface. We give the negative reward by thresholding the acceleration sensor's value to judge whether the robot travels on the uneven floor surface or not.
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Stable online learning
We count the number of human intervention in each lap during online learning. Our method does not degradate the pre-trained control performance and gradually decreases the intervention number. Since SELFI contains the same objective as the offline learning, we can have stable learning process on online.
Figure: The number of human intervention on online learning with SELFI. SELF (blue line) gradually decreases the interventions and reaches almost zero. This stable learning process can be achieved by containing the same objectves as the offline training.