Empowerment-based Intrinsic Motivation

An Empowerment-based Solution to Robotic

Manipulation Tasks with Sparse Rewards

Siyu Dai, Wei Xu, Andreas Hofmann, Brian Williams

RSS 2021

Preprint: https://arxiv.org/abs/2010.07986

Abstract

In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes a novel form of intrinsic motivation that can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other existing intrinsic exploration approaches during extensive empirical testing. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.

Videos

Performance on Object-Lifting Tasks In OpenAI Gym Simulation Environment with a Fetch Robot

box_lift_empower_max_alpha0008_etp45_original_5400.mp4

Box-Lifting Task with Empowerment-based Intrinsic Motivation

sphere_lift_empower.mp4

Sphere-Lifting Task with Empowerment-based Intrinsic Motivation

cylinder_no_knock_down_lift_empower.mp4

Cylinder-Lifting Task with Empowerment-based Intrinsic Motivation

cylinder_lift_icm.mp4

Cylinder-Lifting Task with Intrinsic Curiosity Module only

Performance on Object-Lifting Tasks In Gazebo Simulation Environment with a PR2 Robot

PR2_lift_video.mp4

Performance on Pick-and-Place Tasks In Gym Simulation Environment with a Fetch Robot

box_pick_and_place_empower.mp4

Box Pick-and-Place Task with Empowerment-based Intrinsic Motivation

Skills Learned when Empowerment-based Intrinsic Motivation is Combined with DIAYN

empower_3skill.mp4

Number of Skills = 3

empower_5skill.mp4

Number of Skills = 5

Skills Learned with DIAYN only

DIAYN_5_skill_raw_video.mp4

Number of Skills = 5

References

[1] S. Mohamed and D. J. Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in neural information processing systems, 2015.

[2] D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pages 2778–2787, 2017.

[3] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[4] D. Pathak, D. Gandhi, and A. Gupta. Self-supervised exploration via disagreement. ArXiv preprint arXiv:1906.04161, 2019.

[5] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048–5058, 2017.

[6] B. Eysenbach, A. Gupta, J. Ibarz, and S. Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019.

Page updated

Google Sites

Report abuse