E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance 

        Paper                                                                         Code 

Abstract

A critical challenge in multi-agent reinforcement learning (MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance (E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over 10+  stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin. 

Framework

E-MAPP includes four components: 1) A perception module that maps a query q and the current state s to boolean responses. 2) A program executor that maintains a pool of possible subtasks and updates them according to the perceptive results. 3) A task allocator that chooses proper subtasks from the subtask pool and assigns those to agents. 4) A policy module that instructs agents in taking actions to accomplish specific subtasks. 

Perception Module

The perception module learns to map the perception query and the current state to a boolean response. 

Task Allocator

We use a rule-based subtask allocation mechanism based on three trainable auxiliary functions.

Policy Module

The policy module learns sub-policies of the behavior primitive. For a certain behavior primitive, the agent learns both an altruistic policy and a independent policy.

Visualization on Overcooked Environment

Cooperative behavior. One agent is passing an onion to the agent who can chop it.

Parallelized behavior. Two agents are assigned two subtasks concurrently.

Visualization on a hard 4-player task. The agents are required to prepare two dishes while putting out the random fire.

BibTex

@inproceedings{

    changmapp,

    title={E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance},

    author={Chang, Can and Mu, Ni and Wu, Jiajun and Pan, Ling and Xu, Huazhe},

    booktitle={Advances in Neural Information Processing Systems}

}

Acknowledgement: We thank Yuping Luo and Zhecheng Yuan for their careful proofreading and writing suggestions.