Supplementary Material

Data collection and annotation methodology

An outline of the search methodology and recording practices used to collect the dataset of algorithm performance and evaluation methodologies. Unlike the dataset provided in the main body of work which only includes published works, the supplementary dataset instead attempts to incorporate studies from all contemporary MARL studies. In this supplementary work we refer to the published papers used in the main paper as all papers and unpublished works as the other papers.

Paper search strategy

Relevant MARL research papers were gathered between the years 2016 and 2022. These papers were identified using key terms relevant to MARL research, such as "Multi-agent RL", "MARL evaluation" and "Benchmarking MARL". This combination of keyworks was used on the arXiv website along with references from other high influence MARL papers. The search queries were finalised on the 8th of April 2022 and do not include any papers released after this date.

Filtering data to find relevant studies

After the initial collections, the dataset used on the main paper was pruned to ensure relevancy. We considered relevant papers to be papers which were presented in peer reviewed conference or journal paper, and published in the English language. Additionally we restricted the study to papers only in the field of cooperative MARL as we found it to compose the majority of the papers in our dataset.

Algorithm Annotation

In the process of data collection it came to our attention that several algorithms go by slightly different names in multiple papers. For the purpose of the analysis these have been standardised based on the descriptions in the papers where they were found.

Additional Analysis

In addition to the analysis provided in the main paper we provide additional insight into findings from the dataset.

The decline of decentralised training methods

Over time Centralised Training Decentralised Execution (CTDE) has become significantly more common than fully independent learners or Decentralised Training Decentralised Execution (DTDE). Although CTDE has been demonstrated as a powerful approach to solving MARL problems it cannot be assumed to be the optimal solution for all cooperative MARL cases.

Common benchmark algorithms

From our analysis we found these to be the most relevant algorithms in the dataset. They encompass algorithms from both the CTDE and DTDE paradigms. QMIX being the most commonly used algorithm across papers is unsurprising as it introduced the concept of the monotonic mixing network which many contemporary works like QPLEX have built on.

Evaluation settings

The three most common metrics, referred to in our data are, Return, Reward and Win Rate. Win Rate has is primarily used by the SMAC and Traffic Junction environments but, due to the popularity of SMAC is present in 50% of the papers reviewed.

Evaluation procedure, best practices and guideline

We summarise the number of papers that abide by the key practices recommended in the main body of this paper. We also show what percentages of the main and the other papers include with respect with each recommended practice.