LLCoach: Generating Robot Soccer Behaviors using Multi-Role Large Language Models
M.Brienza, E.Musumeci, V. Suriani, D.Affinita, A.Pennisi, D.Nardi, D.D.Bloisi
LLCoach: Generating Robot Soccer Behaviors using Multi-Role Large Language Models
M.Brienza, E.Musumeci, V. Suriani, D.Affinita, A.Pennisi, D.Nardi, D.D.Bloisi
Abstract
The deployment of robots into human scenarios necessitates advanced planning strategies, particularly when we ask robots to operate in dynamic, unstructured environments. RoboCup offers the chance to deploy robots in one of those scenarios, a human-shaped game represented by a soccer match. In such scenarios, robots must operate using predefined behaviors that can fail in unpredictable conditions. This paper introduces a novel application of Large Language Models (LLMs) to address the challenge of generating actionable plans in such settings, specifically within the context of the RoboCup Standard Platform League (SPL) competitions where robots are required to autonomously execute soccer strategies that emerge from the interactions of individual agents. In particular, we propose a multi-role approach leveraging the capabilities of LLMs to generate and refine plans for a robotic soccer team. The potential of the proposed method is demonstrated through an experimental evaluation, which has been carried out by simulating multiple matches where robots with the AI-generated plans play against robots running human built code.
Recent advances in Large Language Models (LLM) offer many opportunities for improving embodied AI systems. This paper presents the use of generative AI in the context of robotic soccer to generate game plans using LLMs. Specifically, we generate a game tactic involving teams of NAO humanoid robots playing autonomously in soccer games, by impersonating a coach in the RoboCup Standard Platform League (SPL).
In Standard Platform League (SPL) matches, the use of commonsense knowledge is limited because specialized knowledge of the rules of the game and robot capabilities are required. Specific information, including the actions available to each agent, is provided through a Retrieval Augmented Generation (RAG) system.
During experimental tests, we conducted matches in a simulated environment between robots playing with AI-generated plans and robots playing with plans generated by traditional code.
LLM-based Planning
The presented work features a multi-role LLM-based, sequential pipeline. Using a VLM and a LLM allows exploiting their commonsense knowledge to obtain a multi-agent plan from a description of the goal and a semi-structured representation of the semantics of the planning domain.
The plan is then refined through four subsequent steps consisting of different LLM queries, where the textual prompt is engineered to obtain specific improvements in the plan. Using several subsequent steps allows for minimizing hallucinations while still using long structured prompts. Examples and constraints in prompts allow steering the result to its most desirable form.
Task-specific knowledge is provided in the prompts by putting together information about the planning domain, the planning goal, the robot teams, and the actions available to them. The domain is represented as a distribution of waypoints, which will then be mapped to real-world coordinates at a later stage during the plan grounding and execution, when robot percepts will be available.
Action Retrieval
Actions available to the agents are retrieved by a Semantic Database, by evaluating their relevance with respect to the given goal and planning domain. Each action is represented in a STRIPS-like semi-structured description and is indexed in the semantic database by embedding their natural language description. Retrieval is then performed by Semantic Similarity.
Step 2: Coach VLM
A real-world video snapshot from a RoboCup SPL match recording is pre-processed by the MARIO visual analysis tool (https://github.com/unibas-wolves/MARIO), to apply an homography to the frame in case it is obtained by a fisheye lens and to mark each robot with a dot, whose color corresponds to the color of its jersey.
A Coach VLM (Visual Language Model) is then used to derive a general strategy for the team from the pre-processed snapshot and a textual prompt (consisting of a description of the agents, tactical recommendations, the planning goal and the retrieved actions), by performing two tasks:
1. A structured description of the scenario represented in the input video frame. This step also implies an implicit "Role retrieval" task, which maps robots in the snapshot to their described roles in the input prompt.
2. A multi-agent high-level plan, featuring all the relevant agents detected in the provided snapshots, representing the suggested strategy for the task at hand, for the given domain, extracted actions, described agents, and planning goal.
Step 3: Plan Refinement
The plan obtained from the previous step must be translated into a more structured version, so that a parser may be used to build a Finite-State Machine that is readily executable by the robotic agents. To this aim, the LLM is instructed to generate a high-level plan for the team from the output of the first step. The result is a more structured version of the high-level plan, with action names and arguments correctly grounded in the domain tokens.
Step 4: Plan Synchronizer
The plan at previous step is a sequential list of actions to do. There is a need to be modified in a form where the actions can be performed at the same time in a synchronized way. For this reason, the LLM is instructed to modify the plan adding a special token "JOIN" where the action which need to be executed at the same time are put together. To reach