Emergence of collective open-ended exploration from Decentralized Meta-Reinforcement learning

Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training.  The full paper can be found here and our open source code can be found here .

0.0_forced_coop.mp4

Video of the agents playing on the training task distribution which they where trained on. This means the agents are not forced to cooperate.

novel_objects_0.0_coop.mp4

Agents playing with novel colors and shapes for task objects, which they have never seen during training

1.0_forced_coop_final.mp4

Here the agents are playing on the task distribution with forced cooperation. This means that the tasks now require the agents to cooperate and coordinate their movements in order to solve the tasks

6_stages.mp4

Agents trained on task trees of 3 stages playing with 6 stages and an extended time limit of 4000 environment steps.

pressure_coop_0.0.mp4

Agents playing on the pressure plate task, which they have not seen during training. One agent has to stay at the green pressure plate and keep activating it. While the pressure plate is active, the other agent is able to bring the object to the in out machine and change it in order to solve the task.