Quality Diversity Reinforcement Learning for Motion Control Tasks

Yan Ma

Master Thesis, Fudan University

Abstract

Reinforcement learning (RL) has demonstrated immense potential in robot locomotion control tasks in recent years, as it can acquire intricate control strategies from high-dimensional state and sensory data. Nonetheless, due to the limited prior knowledge, these approaches may struggle to extract effective information from environmental interactions swiftly and fully. To address this issue, this paper introduces the concept of Quality-Diversity as a form of prior knowledge for motion control tasks, with the aim of enhancing the performance of RL methods in these tasks. 

Building on this idea, this paper proposes two reinforcement learning methods: one based on action quality and the other on action diversity. The former encourages the robot to make high quality decisions during locomotion, ensuring learning to be more stable and preventing potential errors from causing damage to the robot. The latter enables the robot to explore a broader range of actions, allowing for greater exploration of uncertain factors in the task environment and providing the robot with a more comprehensive understanding of the task, thus improving overall decision-making and performance. 

This paper conducted sufficient experiments on 12 different motion control tasks in 3 different environment settings using 4 types of robots with different morphologies. The experiments were analyzed and compared from various aspects such as reward curve, final performance, sample efficiency, statistical indicators, and cross-task performance. The experimental analysis showed that the proposed method in this paper can improve the learning efficiency and final performance of RL methods in various tasks, providing insights and empirical evidence for further research in this field.

Method

Part 1: Action-based Quality-driven RL

Part 2: Action-based Diversity-driven RL

Part 3: Quality-Diversity Driven RL

Motivation Example

Part 1: Quality Part




Part 2: Diversity Part

Experiment and Evaluation

Experiment:

Evaluation:

Zero-shot Adaptation:

Furture Work

Future research can be improved in the following aspects: