Large Language Models Play StarCraft II:

Benchmarks and A Chain of Summarization Approach

Paper

Code

Video

Introduction

StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro-level operations and strategic macro-awareness. Previous works, such as Alphastar and SCC, achieve impressive performance on tackling StarCraft II , however, still exhibit deficiencies in long-term strategic planning and strategy interpretability. Emerging largelanguage model (LLM) agents, such as Voyage and MetaGPT, presents the immense potential in solving intricate tasks. Motivated by this, we aim to validate the capabilities of LLMs on StarCraft II, a highly complex RTS game. To conveniently take full advantage of LLMs’ reasoning abilities,we first develop textual StratCraft II environment, called TextStarCraft II, which LLM agent can interact. Secondly, we propose a Chain of Summarization(CoS) method, including single-frame summarization for processing raw observations and multi-frame summarization for analyzing game information, providing command recommendations, and generating strategic decisions. Experiment results demonstrate that: LLM agents are capable of defeating the built-in AI at the Harder(Lv5) difficulty level.

Methodology

We have engineered an integrative framework that allows Large Language Models (LLMs) to interact seamlessly within the strategy-based gaming context. The framework includes two major components: TextStarCraft II Environment and LLM Agent Design.

TextStarCraft II Environment is a game core where state and action are interpreted into a textual format. This environment encompasses two adaptors: 'Observation-to-Text', converting game elements into readable data, and 'Text-to-Action', transforming the Large Language Model(LLM)'s textual commands into game actions.

LLM Agent Design: Central to our LLM agent is a multi-level summarization mechanism, referred to as the Chain-of-Thought Summary. This dual-level process not only refines the LLM's understanding of the game but also bridges the textual game data with actionable insights. The interactive pipelines encompass the sequential processes of reviewing game states, analyzing evolving situations, planning and suggesting tactics, and ultimately making decisive actions.

Results

Our experiments utilized diverse language models, including GPT-3.5-turbo-16k models and finetuned language models. With an action space comprising 80 distinct actions, the LLM agent made over 700 strategic decisions throughout a game duration of 28 minutes. Impressively, the agent emerged victorious against a Harder (lv5) AI opponent in TextStarCraft2.

Fig.1 In the early game stage, scouting is crucial. The Protoss have decided to dispatch a Probe to scout the enemy's main base.

Fig.2 The Protoss employs Shield Batteries as a pivotal defensive strategy against Zerg aggression.

Fig.3 The Protoss are utilizing the Chronoboost ability to accelerate the "Protoss Air Weapon Level 1" upgrade in the Cybernetics Core.

Highlighted moments from the game

Page updated

Google Sites

Report abuse