Learning to Cooperate with Humans using Generative Agents
Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon Shaolei Du*, Natasha Jaques*
{yancheng, daphc, abhgupta, ssdu, nj}@cs.washington.edu
University of Washington
Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon Shaolei Du*, Natasha Jaques*
{yancheng, daphc, abhgupta, ssdu, nj}@cs.washington.edu
University of Washington
TLDR: We use generative models to sample infinite human-like partner agents to train a coordinator agent. These agents cooperate well with real human players, achieving better performance compared to baselines FCP, CoMeDi and MEP.
Try playing against our trained Overcooked agents in the live demo below! Select the AI agent using the dropdown menu for Player 2. Choose GAMMA (GAMMA HA is the best) to play our models. Use the arrow keys to move your player around the game, and press the space bar to interact with (pick up, set down) the objects.
Cooperator
Human
At the beginning of the game, our Cooperator tried two strategies simultaneously: placing onions on the central table to pass to human players and carrying an onion to the pot directly. Regardless of which strategy the human player chose, the Cooperator could always infer their intentions and adapt accordingly. As the game progressed, when human players stuck to the strategy of standing beneath the table and focusing on placing onions on it, the Cooperator also adjusted its strategy to focus on taking onions from the table and placing them in the pot, achieving efficient cooperation.
The Cooperator only adopted a specific strategy (delivering the onion from the left) and ignored the signal of humans for a different strategy (passing the onions on the table). This can lead to coordination failure (blocking each other) and insufficient cooperation.
The Cooperator completely ignored the human player's signal to pass the onion. Moreover, the onion on the central counter became an out-of-distribution sample, causing the Cooperator's policy to completely fail.
The Cooperator passively waits for the human to pass the onion to it and can only cooperate under this specific strategy. When the human takes no action, the Cooperator does nothing as well, failing to help the human player.
Generative models can do interpolation between existed agents
Generative models cover more behavior “modes” compared to behavior cloning
Pretrained from simulated data, and finetune on human data
Use human data to do controlled sampling, to adapt the embedding space towards human data
Sample an infinite number of “human-like” training partners
@inproceedings{lianglearning,
title={Learning to Cooperate with Humans using Generative Agents},
author={Liang, Yancheng and Chen, Daphne and Gupta, Abhishek and Du, Simon Shaolei and Jaques, Natasha},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}