Learning to Cooperate with Humans using Generative Agents

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon Shaolei Du*, Natasha Jaques*

{yancheng, daphc, abhgupta, ssdu, nj}@cs.washington.edu

University of Washington

TLDR: We use generative models to sample infinite human-like partner agents to train a coordinator agent. These agents cooperate well with real human players, achieving better performance compared to baselines FCP, CoMeDi and MEP.

Experience It Yourself – Try Our Demo Now!

Overcooked demo game

Try playing against our trained Overcooked agents in the live demo below! Select the AI agent using the dropdown menu for Player 2. Choose GAMMA (GAMMA HA is the best) to play our models. Use the arrow keys to move your player around the game, and press the space bar to interact with (pick up, set down) the objects.

Videos With Real Human Players

Cooperator

Human

Ours: GAMMA + HA

At the beginning of the game, our Cooperator tried two strategies simultaneously: placing onions on the central table to pass to human players and carrying an onion to the pot directly. Regardless of which strategy the human player chose, the Cooperator could always infer their intentions and adapt accordingly. As the game progressed, when human players stuck to the strategy of standing beneath the table and focusing on placing onions on it, the Cooperator also adjusted its strategy to focus on taking onions from the table and placing them in the pot, achieving efficient cooperation.

PPO - BC

The Cooperator only adopted a specific strategy (delivering the onion from the left) and ignored the signal of humans for a different strategy (passing the onions on the table). This can lead to coordination failure (blocking each other) and insufficient cooperation.

FCP

The Cooperator completely ignored the human player's signal to pass the onion. Moreover, the onion on the central counter became an out-of-distribution sample, causing the Cooperator's policy to completely fail.

CoMeDi

The Cooperator passively waits for the human to pass the onion to it and can only cooperate under this specific strategy. When the human takes no action, the Cooperator does nothing as well, failing to help the human player.

Qualitative Scores and Subjective Human Ratings

Counter Circuit

Multi-strategy Counter

Abstract

Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL). Current algorithms focus on training simulated human partner policies which are then used to train a Cooperator agent. The simulated human is produced either through behavior cloning over a dataset of human cooperation behavior, or by using MARL to create a population of simulated agents. However, these approaches often struggle to produce a Cooperator that can coordinate well with real humans, since the simulated humans fail to cover the diverse strategies and styles employed by people in the real world. We show learning a generative model of human partners can effectively address this issue. Our model learns a latent variable representation of the human that can be regarded as encoding the human's unique strategy, intention, experience, or style. This generative model can be flexibly trained from any (human or neural policy) agent interaction data. By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. We evaluate our method---Generative Agent Modeling for Multi-agent Adaptation (GAMMA)---on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets. Further, we propose a method for posterior sampling from the generative model that is biased towards the human data, enabling us to efficiently improve performance with only a small amount of expensive human interaction data.

Why Generative Models Work?

Generative models can do interpolation between existed agents

Generative models cover more behavior “modes” compared to behavior cloning

Pretrained from simulated data, and finetune on human data
Use human data to do controlled sampling, to adapt the embedding space towards human data
Sample an infinite number of “human-like” training partners

Cite

@inproceedings{lianglearning,

title={Learning to Cooperate with Humans using Generative Agents},

author={Liang, Yancheng and Chen, Daphne and Gupta, Abhishek and Du, Simon Shaolei and Jaques, Natasha},

booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}

}