Abstract
Quality-Diversity (QD) algorithms are powerful exploration algorithms that allow robots to discover large repertoires of diverse and high-performing skills. However, QD algorithms are sample inefficient and require millions of evaluations. In this paper, we propose Dynamics-Aware Quality-Diversity (DA-QD), a framework to improve the sample efficiency of QD algorithms through the use of dynamics models. We also show how DA-QD can then be used for continual acquisition of new skill repertoires. To do so, we incrementally train a deep dynamics model from experience obtained when performing skill discovery using QD. We can then perform QD exploration in imagination with an imagined skill repertoire. We evaluate our approach on three robotic experiments. First, our experiments show DA-QD is 20 times more sample efficient than existing QD approaches for skill discovery. We then demonstrate learning an entirely new skill repertoire in imagination to perform zero-shot learning. Finally, we show how DA-QD is useful and effective for solving a long horizon navigation task and for damage adaptation in the real world.
Dynamics-Aware Quality Diversity (DA-QD) combines deep dynamics models with QD to efficiently perform skill discovery. We use the models to seek out expected novel policies in imagination during skill discovery with QD. Using dynamics models for QD benefits both the QD exploration process and model learning. This lets us learn novel and diverse skills using QD purely from imagined states without any or minimal environment interactions, increasing the sample efficiency of QD by an order of magnitude. At the same time, the QD exploration process inherently provides a rich and diverse dataset of transitions which enables better models to be learnt.
Planning and damage recovery with the learnt skill repertoires are performed using the RTE algorithm. Skill-space planning for a long-horizon navigation task is done using a Monte-Carlo Tree Search (MCTS) planner coupled with Gaussian Process models which enable sim-to-real transfer of the skills.
Despite requiring 20 times less evaluations, the quality of the skill repertoire is the same as one generated with a normal QD algorithm, and does not impact performance on longer-horizon navigation tasks using RTE.