Policy Manifold Search:

Exploring the Manifold Hypothesis for Diversity-based Neuroevolution

Genetic and Evolutionary Computation Conference

Authors: Nemanja Rakicevic, Antoine Cully, Petar Kormushev (Imperial College London)

Paper: https://dl.acm.org/doi/10.1145/3449639.3459320

Code: https://github.com/nemanja-rakicevic/policy_manifold_search

SUMMARY

Policy Manifold Search is an approach to performing diversity-based policy search, which leverages the concept of the manifold hypothesis, in the context of the policy network parameter space.

Instead of performing policy search in the original high-dimensional space of policy's neural network parameters, we find a lower-dimensional representation which preserves the properties of the original parameter space, and perform the search in this representation space.

MAP-Elites framework is used to maintain a collection of policies which exhibit diverse behaviours in the environment, and serves both as a dataset for learning policy parameter representations, as well as a principled framework to do diversity-based policy search.

The findings from our experimental evaluations suggest that using a lower-dimensional policy parameter representation helps improve sample efficiency and the overall number of discovered behaviours compared to the state-of-the-art approaches.

METHOD

The main goal is to leverage learned representations of the policy network parameters, and perform policy search in this learned representation space.

This is done by using the MAP-Elites framework for creating a collection of policies, and using them to learn the representations, with the use of an Autoencoder. The collection consists of different policies, which exhibit diverse behaviours when executed in an environment, according to the behaviour descriptor.

When performing the policy search in the latent space, it is necessary to consider the additional transformaiton caused by the decoder. This transformation is accounted for by introducing Jacobian scaling of the latent space sampling covariance matrix.