Discovering Diverse Solutions
in Deep Reinforcement Learning
Takayuki Osa1,2, Voot Tangkaratt2 and Masashi Sugiyama2,3
1. Kyushu Institute of Technology
2. RIKEN Center for Advanced Intelligence Project
3. The University of Tokyo
Abstract
Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solutions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution.
In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.
Behaviors learned by LTD3 with two-dimensional continuous latent variable
Hopper agent
Walker2d agent
Few-shot adaptation for different embodiments
Hopper agent
Hopper-ShortShort
Hopper-HighKnee
Hopper-LowKnee
Hopper-LongHead
Walker2d agent
Walker-ShortOrange
Walker-Asym1
Walker-Asym2
Walker-LowKnee
Continuous change of walking styles
Hopper
Change of the hopping style
Walker2d
Two-leg walking to one-leg hopping
Citation
Takayuki Osa, Voot Tangkaratt and Masashi Sugiyama. Discovering Diverse Solutions in Deep Reinforcement Learning. arXiv, 2021. [arXiv]