COLE
ICML Paper:
Cooperative Open-ended Learning Framework for Zero-shot Coordination
Yang Li₁*, Shao Zhang₂*, Jichen Sun₂, Yali Du₃, Ying Wen₂†, Xinbing Wang₂, Wei Pan₁†Extended Paper:
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Yang Li₁*, Shao Zhang₂*, Jichen Sun₂, Wenhao Zhang₂, Yali Du₃, Ying Wen₂†, Xinbing Wang₂, Wei Pan₁†
1: The University of Manchester 2: Shanghai Jiao Tong University 3: King's College London
*: Equal contribution †: Corresponding authors: ying.wen@sjtu.edu.cn wei.pan@manchester.ac.uk
NEW UPDATES
Support Human-Human Experiments (human branch)
Play with LLM Agent like GPT-4 (main branch)
Training your own COLE_SV agent (cole_training branch)
ZSC baseline agents including SP, FCP, PBT, MEP (baseline_training branch)
Outlines
Part 1: COLE Human-AI Experiment Platform
Part 2: Video Demo - COLE with different level partners
Part 3: Video Demo - Why COLE underperforms MEP at Coord. Ring Layout
Part 1 : COLE Human-AI Experiment Platform
In work, we developed an evaluation platform built around the Overcooked game, designed to support Human-AI experiments. Overcooked is a two-player fully cooperative game. The system is shown as follows.
Here, you're granted the ability to:
Upload your weights
Customize the human questionnaire
Configure game settings
And many more!
The code is available at Github Repo Link. In the repo, we also provide the pre-trained weights of SP, PBT, FCP, MEP and our COLE.
Part 2 : Video Demo - COLE with different level partners
We visualize the trajectories of COLEs playing with middle-level partner (human proxy model) and expert partner (PBT).
Note: All videos are sped up.
- COLE 0:4 with expert partner (PBT model)
The blue player is controlled by COLE 0:4 model, the green one is PBT model.
2. COLE 1:3 with expert partner (PBT model)
The blue player is controlled by COLE 1:3 model, the green one is PBT model.
3. COLE 1:3 with middle-level partner (human proxy model)
The blue player is controlled by COLE 1:3 model, the green one is human proxy model.
4. COLE 0:4 with middle-level partner (human proxy model)
The blue player is controlled by COLE 0:4 model, the green one is human proxy model.
Part 3: Video Demo - Why COLE underperforms MEP at Coord. Ring Layout
We visualize the trajectories of MEP and COLE playing with different humans at different starting positions.
In the Coord. Ring layout, COLE slightly lags behind MEP but outperforms other baselines. Analysis shows MEP's success in human experiments is due to its predictable counterclockwise strategy, allowing easy adaptation by humans who simply choose the opposite direction. This predictability, however, may skew questionnaire responses about AI's contribution and teamwork. While MEP seems more understandable and contributive, it's largely due to its static strategy. Thus, MEP's performance drops with less intelligent partners, as illustrated in Fig.8.
Note: All videos are sped up.
Human 1 (Green Hat) v.s. MEP (Blue Hat)
Human 2 (Blue Hat) v.s. MEP (Green Hat)
In contrast, COLE's strategies exhibit greater diversity and are less predictable than MEP's. COLE adapts its routes based on varying situations, which can lead to more conflicts. Consequently, in human subjective evaluations, COLE slightly underperforms compared to MEP.
Note: All videos are sped up.
Human 3 (Green Hat) v.s. COLE (Blue Hat)
Human 4 (Blue Hat) v.s. COLE (Green Hat)