Guided goal generation for hindsight multi-goal reinforcement learning


Chenjia Bai, Peng Liu, Wei Zhao, Xianglong Tang

Pattern Recognition and Intelligence System Research Center

School of Computer Science and Technology

Harbin Institute of Technology, Harbin 150001, China


Code available in Github

Paper available at : https://doi.org/10.1016/j.neucom.2019.06.022

Abstract

Typical reinforcement learning (RL) can only perform a single task and thus cannot scale to problems for which an agent needs to perform multiple tasks, such as moving objects to different locations, which is relevant to real-world environments. Hindsight experience replay (HER) based on universal value functions shows promising results in such multi-goal settings by substituting achieved goals for the original goal, frequently giving the agent rewards. However, the achieved goals are limited to the current policy level and lack guidance for learning. We propose a novel guided goal-generation model for multi-goal RL named G-HER. Our method uses a conditional generative recurrent neural network (RNN) to explicitly model the relationship between policy level and goals, enabling the generation of various goals conditions on the different policy levels. Goals generated with a higher policy level provide better guidance for the RL agent, which is equivalent to using knowledge of successful policy in advance to guide the learning of current policy. Our model accelerates the generalization of substitute goals to the whole goal space. The G-HER algorithm is evaluated on several robotic manipulating tasks and demonstrates improved performance and sample efficiency.


Experimental Results

Environment

We use the same Robot Fetch environment includes 4 Fetch tasks:

1. FetchReach. Move the gripper to the target position. FetchReach is the simplest task in all environments.

2. FetchPush. Push the box in front of the robot to the target position on the table. The robot fingers are locked to prevent grasping.

3. FetchPickAndPlace. Use the finger to grasp a box, and then move it to the target position, which may be on the platform or in the air.

4. FetchSlide. Hit the slider on a long table to the target position. The target position is outside the range of the robot arm. The robot arm hits and pushes the slider. The slider stops in the target position and gets rewards. To decide on a suitable action, the agent must consider the friction of the table and slider as well as the weight. FetchSlide is the most difficult task of all the tasks.

fetchreach.pdf

FetchReach

fetchpush.pdf

FetchPush

fetchpickandplace.pdf

FetchPickAndPlace

fetchslide.pdf

FetchSlide

Result comparison

The figure shows the median test success rate for all four robot fetch tasks. FetchReach, FetchPush, FetchPickAndPlace can be solved by the HER and G-HER algorithms, but G-HER performs better in FetchReach and FetchPush tasks to improve the sample efficiency. FetchSlide is the hardest task and achieves a test success rate of about $0.6$ with the HER algorithm. The G-HER algorithm outperforms HER and achieves a test success rate of more than $0.7$ in FetchSlide. We can also find the vanilla DDPG with dense rewards often works better than typical UVFAs with sparse rewards. Both G-HER and HER with sparse rewards perform better than DDPG with dense rewards. The proposed G-HER algorithm provides better guidance for learning, and it improves the performance and sample efficiency in robot fetch tasks under multi-goal and sparse-reward setting.

FetchReach-v1.pdf

FetchReach

FetchPush-v1.pdf

FetchPush

FetchPickAndPlace-v1.pdf

FetchPickAndPlace

FetchSlide-v1.pdf

FetchSlide

Video

We record the best policy in all tasks to videos for 20 episodes play.

FetchReach.mp4

FetchReach

FetchPush.mp4

FetchPush

FetchPickAndPlace.mp4

FetchPickAndPlace

FetchSlide.mp4

FetchSlide

[1] Andrychowicz M, Wolski F, Ray A, et al. Hindsight experience replay[C]//Advances in Neural Information Processing Systems. 2017: 5048-5058.