While reinforcement learning has recently been able to achieve unprecedented success, it often comes at the cost of high sample complexity. Reward-free, unsupervised skill learning promises an efficient alternative by pre-training skills in the environment without access to task supervision. However, such pre-training methods are inefficient and are often ineffective in evolving environments. One reason for this is that current skill discovery methods learn all the skills simultaneously, which can cause a circular dependency in training -- the learning of one skill is intricately connected to the simultaneously learning of other skills. In this work, we propose a new framework for skill discovery, where skills are learned one after another in an incremental fashion with the previously learned skills kept fixed. This breaks the inter-dependency of skills, which allows them to learn efficiently and adapt to changing environments. We demonstrate experimentally on several MuJoCo environments that learning incrementally improves performance on discovering skills that are diverse (high intra-skill variance) and self-consistent (low inter-skill variance), which in turn improves downstream reward-based task learning. In environments with evolving dynamics, incremental skills significantly outperform current state-of-the-art skill discovery methods on both skill quality and the ability to solve downstream tasks.
From left to right, visualization of the trajectory of 10 learned skills, and execution of four learned skills on the environment.
Note: The videos are best viewed on full-screen, at 1080p resolution (click the gear icon with SD on the bottom right). If you see empty whitespace, make sure your AdBlock isn't blocking gfycat or try using a different browser. [Direct link] [Download link]
From left to right, visualization of the trajectory of 10 learned skills, and execution of four learned skills on the environment. [Direct link] [Download link]
From left to right, visualization of the trajectory of some learned skills and execution of four learned skills (chosen to be as diverse as possible) on the environment. [Direct link] [Download link]
You can download the code and instructions to run it from here: https://zenodo.org/record/4900616.