Traditionally, players in first-person shooter (FPS) games have been limited tocommunicating with AI companions using simple commands like “attack,” “defend,” or “retreat” due to the constraints of existing input methods such as hotkeys and command wheels. One major limitation of these simple commands is the lack of target specificity, as the numerous targets in a 3D virtual environment are difficult to specify using existing input methods. This limitation hinders players’ ability to issue complex tactical instructions such as “clear the second floor,” “take cover behind that tree,” or “retreat to the river.”
To overcome this limitation, this paper introduces the FPS AI Companion who Understands Language(F.A.C.U.L.), the first-ever AI system that allows players to interact with FPS AI companions through natural language. Deployed in the popular FPS game Arena Breakout: Infinite, this revolutionary feature creates the most immersive experience for players, enabling them to work with human-like AI. F.A.C.U.L. is not confined to executing limited commands through simple rule-based systems. Instead, it allows players to engage in real-time voice interactions with AI teammates. By integrating various natural language processing techniques within a confidence-based selection framework, it achieves rapid and accurate decomposition of complex commands and intent reasoning.
Moreover, F.A.C.U.L. employs a multi-modal dynamic entity retrieval method for environmental perception, aligning human intentions with decision-making elements. It can accurately comprehend complex voice commands, achieving 89% accuracy in user studies, and delivers real-time behavioral responses and vocal feedback to provide close tactical collaboration to players. Additionally, it can identify more than 17,000 objects in the game, including buildings, vehicles, grasslands, and collectible items, and has the ability to accurately distinguish different colors and materials. Overall, F.A.C.U.L. achieves an average response time of 800ms, resulting in a 75% cost reduction compared to baselines.
The full match video demo is shown as below: