Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

Matteo Bettini, Ryan Kortvelesy, Amanda Prorok

Introduction

This website complements the paper "Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning" by presenting multimedia results from the experiments analyzed in the work.

Diversity Control (DiCo)

We propose Diversity Control (DiCo), a novel method to constrain behavioral diversity.

DiCo achieves this by representing policies as the sum of a homogeneous component and heterogeneous components, which are dynamically scaled according their current and desired diversities.

Case study: Multi-Agent Navigation

We showcase how DiCo works in a simple scenario where agents need to navigate to a goal.

Agents navigating to different goals

In this task instance, each agent is assigned a different goal and all agents observe all goals. Thus, behavioral heterogeneity is needed to solve the task.

Our results show that DiCo allows to successfully control diversity, with higher heterogeneity leading to higher perfomance

Agents constrained by DiCo show regular diversity distribution patterns over the observation space.

Agents navigating to the same goal

In this task instance, all agents are assigned the same goal. Thus, behavioral homogeneity is needed to solve the task.

Our results show that DiCo allows to successfully control diversity, with lower heterogeneity leading to higher perfomance. 

Agents constrained with an unwanted heterogeneity budget learn to distribute diversity far from the goal, where it is most vital to act homogeneously.

As the desired heterogeneity increases, agents need to find creative strategies to achieve the same thing in different ways.

Dispersion: Tackling multiple objectives

In this task, all agents are spawned in the center. One food particle per agent is spawned around them at random. All agents observe all food particles. 

While homogeneous agents have to all consume one food particle at a time and unconstrained heterogeneous agents learn a suboptimal policy that sends more than one agent to the same food particle, constrained heterogeneous agents learn the optimal policy, consisting in each agent tackling a different food particle.

Homogeneous agents are forced to consume one food at a time

Unconstrained heterogeneous agents are able to go to different food particles, but learn a suboptimal policy which may send more agents to the same food particle.

Constrained heterogeneous agents find the optimal policy consisting each tackling a different food particle.

Sampling: Boosting exploration

In this task, all agents are spawned at the center of a uniform probability field. By moving around, agents sample without replacement the latent field. Their goal is to maximise the sampled surface.

Again, we observe how, by constraining heterogeneity, it is possible to boost exploration and improve the agent distribution over the latent field.

Homogeneous agents all explore the same area

Unconstrained heterogeneous agents are able to explore different areas but may resvisit sampled areas

Constrained heterogeneous agents find the optimal policy spreading equally in the sampling space and maximising the sampled surface

Tag: Emergent adversarial strategies

Three agents and two obstacles are spwaned at random in the workspace. Two red agents need to tag the evading green agent. 

In this scenario, we see how by constraining behavioral diversity, novel creative strategies emerge, which prove successfull in the task.

Homogeneous agents all focus on the green agent

Unconstrained heterogeneous agents also focus on the green agent, blindly following it

Constrained heterogeneous agents are forced to learn heterogeneous strategies and thus learn interesting emergent behaviors

Ambushing: agents split up to cover more area and intercept the green agent

Cornering: agents perform a pinching manouver, cornering the green agent

Blocking: one agent performs man-to-man marking to block the escape routes of the green agent, while the other chases close up 

Reverse Transport: Injecting a prior through diversity constraints

In this task, agents are spawned inside a heavy package. They need to push together to bring the package to the goal.

We can intuitively estimate that the optimal policy for this task presents a low diversity, since agents benefit from doing the same thing.

By costraining agents to a low diversity (0.1), agents can learn faster than in than the unconstrained heterogeneous case, while still not being completely homogneous.

Citation

@inproceedings{bettini2024controlling,

   title={Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning},

   author={Bettini, Matteo and Kortvelesy, Ryan and Prorok, Amanda},

   booktitle={Forty-first International Conference on Machine Learning},

   year={2024},

   url={https://openreview.net/forum?id=qQjUgItPq4}

}