COLE

1: The University of Manchester 2: Shanghai Jiao Tong University 3: King's College London

*: Equal contribution †: Corresponding authors: ying.wen@sjtu.edu.cn wei.pan@manchester.ac.uk 



NEW UPDATES

Welcome to visit our Github Repo!!!

Outlines

Part 1 : COLE Human-AI Experiment Platform

In work, we developed an evaluation platform built around the Overcooked game, designed to support Human-AI experiments. Overcooked is a two-player fully cooperative game. The system is shown as follows. 

Here, you're granted the ability to:

The code is available at Github Repo Link. In the repo, we also provide the pre-trained weights of SP, PBT, FCP, MEP and our COLE.

Part 2 : Video Demo - COLE with different level partners

We visualize the trajectories of COLEs playing with middle-level partner (human proxy model) and expert partner (PBT). 

Note: All videos are sped up.

The blue player is controlled by COLE 0:4  model, the green one is PBT model. 

2. COLE 1:3 with expert partner (PBT model)

The blue player is controlled by COLE 1:3  model, the green one is PBT model

3. COLE 1:3 with middle-level partner (human proxy model)

The blue player is controlled by COLE 1:3  model, the green one is human proxy model. 

4. COLE 0:4 with middle-level partner (human proxy model)

The blue player is controlled by COLE 0:4  model, the green one is human proxy model. 

Part 3: Video Demo - Why COLE underperforms MEP at Coord. Ring Layout

We visualize the trajectories of MEP and COLE playing with different humans at different starting positions. 

In the Coord. Ring layout, COLE slightly lags behind MEP but outperforms other baselines. Analysis shows MEP's success in human experiments is due to its predictable counterclockwise strategy, allowing easy adaptation by humans who simply choose the opposite direction. This predictability, however, may skew questionnaire responses about AI's contribution and teamwork. While MEP seems more understandable and contributive, it's largely due to its static strategy. Thus, MEP's performance drops with less intelligent partners, as illustrated in Fig.8.

Note: All videos are sped up.

Human 1 (Green Hat) v.s. MEP (Blue Hat)  

Human 2 (Blue Hat) v.s. MEP (Green Hat

In contrast, COLE's strategies exhibit greater diversity and are less predictable than MEP's. COLE adapts its routes based on varying situations, which can lead to more conflicts. Consequently, in human subjective evaluations, COLE slightly underperforms compared to MEP.

Note: All videos are sped up.

Human 3 (Green Hat) v.s. COLE (Blue Hat)  

Human 4 (Blue Hat) v.s. COLE (Green Hat