Hokoff
Hokoff is a collection of pre-collected datasets designed for Offline Reinforcement Learning and Multi-Agent Reinforcement Learning. It also provides a comprehensive framework to facilitate research in Offline RL. You can access and download the datasets and framework code mentioned in our paper from this website.
Abstract
The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose \texttt{Hokoff}, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized MOBA game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.
Contributions
The tasks we adopt are based on one of the world's most famous Multiplayer Online Battle Arena (MOBA) games, Honor of Kings (HoK), which has over 100 million daily active players, ensuring the practicality of our datasets. The complexity of this environment dramatically surpasses those of its counterparts, demonstrating the potential for simulating real-world scenarios more accurately.
We present an open-source, easy-to-use framework. This framework includes comprehensive processes for offline RL (sampling, training, and evaluation), and some useful tools. Based on the framework, We release a rich and diverse set of datasets with different design factors, catering not only to offline RL but also offline MARL.
Building on the framework, we reproduce various offline RL and offline MARL algorithms and propose a novel baseline algorithm tailored for the inherent hierarchical structured action space of Honor of Kings. We fully validate and compare these baselines on our datasets. The results indicate that current offline RL and offline MARL approaches are unable to effectively address complex tasks with discrete action space. Additionally, these methods exhibit shortcomings in terms of their generalization capabilities and their ability to facilitate multi-task learning.
Tasks
HoK 1v1
Honor of Kings Arena (HoK 1v1) is a 1v1 task based on Honor of Kings where each player attempts to beat the other and destroy their opponent's crystal. Specifically, each player chooses a hero before the match starts and controls it to venture out from their base, gain gold and experience by killing or destroying other game units. The goal is to destroy the opponent's turrets and base crystal while also defending their own crystal. For a full description of the task, please refer to the original paper "Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning".
HoK 3v3
Honor of Kings 3v3 Arena (HoK 3v3) is a MOBA game, where each team comprises three heroes who collaborate to defeat their opponents. The basic rules and win conditions are similar to HoK Arena. However, the HoK 3v3 map contains additional turrets and features a new area called the "wilderness", inhabited by diverse monsters. Besides, collaboration is essential in HoK 3v3, where players must select different heroes and fulfill distinct roles to work together more efficiently. For a full description of the task, please refer to the original paper.