When Does Neuroevolution Outcompete Reinforcement Learning in Transfer Learning Tasks?

Abstract

The ability to continuously and efficiently transfer skills across tasks is a key element of biological life and a grand quest in artificial systems. Reinforcement learning (RL), arguably the most commonly-used and high-performing family of optimizers for arbitrary tasks, is known to be brittle to task variations and prone to catastrophic forgetting. Neuroevolution (NE) has been recently examined as an alternative to RL that can bring benefits in terms of robustness, avoidance of local optima and scalability. Here, we aim at contributing to this accumulating evidence through an empirical study of a different, understudied, aspect of NE algorithms: their transfer learning abilities. To this end, we introduce two benchmarks: a) stepping gates is a collection of tasks capturing the need for transferring skills through the reuse and modification of existing behaviors b) ecorobot is an extension of a physics-based simulator that enables testing both for complex locomotion and navigation, where we have implemented a variety of tasks aiming at testing the ability of policies to complexify gradually. Our empirical analysis indicates that NE approaches differ in their ability to handle such challenges and that they often outcompete RL, revealing promising directions but also challenges for scaling up NE to richer behavioral repertoires.

Analysis of experiments

Below we visualise some of the key behaviors discussed in the paper.

Maze with stepping stones

In this task, the optimal course of actions is to sequentially go through all the stepping stones (red spheres) and end up at the green one.

Policy found by NEAT. The agent has solved the task (Reward 6800)

Policy found by PPO. The agent reached the second stepping stone (Reward 998)

Policy found by MAP-Elites. The agent reached the food but did not go through the stepping stones (Reward 1700)

Hierarchical obstacles

In this task, the agent needs to maximize its speed while walking over obstacles whose height increases gradually

Policy found by NEAT. The agent passes over all obstacles (Reward 5852)

Policy found by PPO. The agent manages to traverse the obstacles but more slowly as it falls over (Reward 2661)

Deceptive maze-easy

In this task, the agent needs to maximize its speed while walking over obstacles whose height increases gradually

Policy found by HyperNEAT (Reward 8.77)

Policy found by PPO (Reward 7.93)

Page updated

Google Sites

Report abuse