>> using team-level expressions to describe team-level tasks <<
We want to command the entire team with some abstract high-level goal (and not individual robots one-by-one)
"Hey team, locate the people on the other side of the bridge and guide them back home"
"Defend this flag against the red team"
The notion of a high-level abstract task ("do X")
can be complex, and need logic + reasoning
can be decomposed into "sub-tasks", X: {x1, x2, .. xn}
can be arranged as a sequence of sub-tasks {x1 --> x5 --> x2}
.. with some events governing the --> (transitions)
Robot tasks can be modeled as computing on a sequence: from a start 'state', process a sequence of 'events' (observations from sensors, feedback) until an accepting state is reached. Some events advance the state and some loop forever. Each 'state' is then typically a sub-process.
A deterministic finite automaton (DFA) can represent an algorithm to be executed.
Instead of focusing on a single task / problem, we automate a data generation step that composes several automata along with natural-language descriptions of their subtasks and states.
We can query typical cloud-based LLM/LMMs with as much task-information as is available, and compile all products into one reusable dataset (D).
For a task T, we can present an LLM a high-level description of it, along with images, templates etc. Then we ask it to produce an automaton for us.
The process involves three steps.
Generation of the task DFA (automaton) as code.
Generation of a large sentence dictionary describing the task and the subtask in natural language.
Do a random walk on the DFA and sample a subtask language description from the sentence dictionary for each visited state.
For the automaton generation, we provide the LLM with a description of the task along with a Python template with the DFA class and key functions available. An example prompt is available here:
You are leading a team of robots initialized at random locations in an arena subdivided into four rooms.
Each of the first three rooms (from left to right) contains a switch.
A switch must be activated by a robot to open the wall leading to the next room.
Robots must activate switches in sequence (first → second → third) to unlock access to the final room.
Once the third switch is activated, the robots can proceed to their corresponding goal.
The attached image shows a simplified map of the environment.
Your mission:
1. Generate the automaton representation of the task using the following scenario events:
- "found first switch"
- "found second switch"
- "found third switch"
2. Use the DFA generator template below when writing your solution:
class DFAConfig:
name: str
automaton_id: int
initial: State
finals: set[State]
transition: Dict[State, Callable[[Vector], State]]
outfile: Path
# ─── Transition Predicates ─────────────────────────────────────────────────────────
Pred = Callable[[Vector], bool]
def has(bit_idx: int) -> Pred:
return lambda v, i=bit_idx: bool(v[i])
def not_has(bit_idx: int) -> Pred:
return lambda v, i=bit_idx: not v[i]
def all_of(*preds: Pred) -> Pred:
return lambda v, ps=preds: all(p(v) for p in ps)
def any_of(*preds: Pred) -> Pred:
return lambda v, ps=preds: any(p(v) for p in ps)
def advance_if(pred: Pred, then_state: State, else_state: State) -> Callable[[Vector], State]:
"""Single guard: if pred(v) then then_state else else_state."""
return lambda v, p=pred, t=then_state, e=else_state: (t if p(v) else e)
def advance_case(*cases: Tuple[Pred, State], default: State) -> Callable[[Vector], State]:
"""Ordered guards: for the first (pred, state) where pred(v) is True, go to state.
Otherwise go to default.
"""
def step(v: Vector, cs=cases, d=default) -> State:
for pred, st in cs:
if pred(v):
return st
return d
return step
Format your output like this capture-the-flag example:
FIND_FIRST = 0
FIND_SECOND = 1
FIND_THIRD = 2
FIND_GOAL = 3
STATES = {
"goal": FIND_GOAL,
"third": FIND_THIRD,
"second": FIND_SECOND,
"first": FIND_FIRST,
}
S1, S2, S3, G = range(4)
EVENTS: Sequence[str] = {
S1: ("found_first_switch"),
S2: ("found_second_switch"),
S3: ("found_third_switch"),
G: ("found_goal"),
}
def find_rgbd_then_switch_then_goal_config() -> DFAConfig:
"""Find flags in RGBP order, hit the switch, and navigate to the goal."""
transition: Dict[State, Callable[[Vector], State]] = {
"red": advance_if(has(R), "green", "red"),
"green": advance_if(has(G), "blue", "green"),
"blue": advance_if(has(B), "purple", "blue"),
"purple": advance_if(has(P), "switch", "purple"),
"switch": advance_if(has(SW), "goal", "switch"),
"goal": (lambda v: "goal"),
}
return DFAConfig(
name="find_rgbp_switch_goal",
automaton_id=0,
initial="red",
finals={"goal"},
transition=transition,
outfile=Path("sequence_models/data/random_walk_four_flags_rgbp.json"),
)
For this given prompt we get the following ChatGPT 5 Output:
FIND_FIRST = 0
FIND_SECOND = 1
FIND_THIRD = 2
FIND_GOAL = 3
STATES = {
"goal": FIND_GOAL,
"third": FIND_THIRD,
"second": FIND_SECOND,
"first": FIND_FIRST,
}
S1, S2, S3, G = range(4)
EVENTS: Sequence[str] = {
S1: ("found_first_switch"),
S2: ("found_second_switch"),
S3: ("found_third_switch"),
G: ("found_goal"),
}
State = str
Vector = list # or np.ndarray / bitvector matching your pipeline
# Bit indices into Vector (adapt as needed in your system)
S1 = 0 # "found first switch"
S2 = 1 # "found second switch"
S3 = 2 # "found third switch"
G = 3 # "at goal" (optional sensor/flag if you keep one)
def open_three_switches_then_goal_config() -> DFAConfig:
"""
Hit switches in left-to-right order (first, second, third) to open the walls,
then navigate to the goal.
"""
transition: Dict[State, Callable[[Vector], State]] = {
# Wait in 'first' until the first-switch event is observed
"first": advance_if(has(S1), "second", "first"),
# After first is found, wait for the second-switch event
"second": advance_if(has(S2), "third", "second"),
# After second is found, wait for the third-switch event
"third": advance_if(has(S3), "goal", "third"),
# Sink final: stay in goal once reached
"goal": (lambda v: "goal"),
}
return DFAConfig(
name="open_three_switches_then_goal",
automaton_id=1,
initial="first",
finals={"goal"},
transition=transition,
outfile=Path("sequence_models/data/open_three_switches_then_goal.json"),
)
For the natural language data generation, we use a procedure similar to the DFA generation process.
The objective here is not to produce executable code, but instead to create a one-to-one natural language description of the task automaton.
The purpose of this step is to expose the RNN model to a wide variety of linguistic expressions, so that it develops strong language literacy and becomes robust to differences in style, tone, or grammar during deployment.
To achieve this variation, we randomize key words in the prompt and generate a large number of distinct queries for each task.
For each query, we collect the natural language output and store it in a large JSON dictionary.
Bellow is an example of a query for the same four-room task. Notice that we use some of the results from the DFA generation to tailor this prompt :
You are leading a team of robots initialized at random locations in an arena subdivided into four rooms.
Each of the first three rooms (from left to right) contains a switch.
A switch must be activated by a robot to open the wall leading to the next room.
Robots must activate switches in sequence (first → second → third) to unlock access to the final room.
Once the third switch is activated, the robots can proceed to their corresponding goal.
The attached image shows a simplified map of the environment.
Your mission:
1. Produce a one-to-one natural language instruction of the task and its subtasks.
2. Refer to robots as {agent_keyword}.
Refer to switches as {switch_keyword}.
Refer to the goal as {goal_keyword}.
Use a {tone_style} tone.
Write in a {style_format} style.
Vary sentence complexity with {grammar_variation}.
Write in first-person imperative, addressing {agent_keyword}s directly (e.g., "{agent_keyword}, navigate to...").
3. Format the output strictly as a JSON object.
Use the following dictionary to define the JSON keys for subtasks:
STATES = {
"goal": FIND_GOAL,
"third": FIND_THIRD,
"second": FIND_SECOND,
"first": FIND_FIRST,
}
The output JSON must follow this structure:
{
"task": "Overall natural language instruction of the complete task.",
"subtasks": {
"first": "Natural language instruction of finding and activating the first switch.",
"second": "Natural language instruction of finding and activating the second switch.",
"third": "Natural language instruction of finding and activating the third switch.",
"goal": "Natural language instruction of reaching the goal after switches are completed."
}
}
For this given prompt with keywords {agent_keyword} → robot, {switch_keyword} → switch, {goal_keyword} → goal, {tone_style} → clear and instructive, {style_format} → step-by-step, {grammar_variation} → mixed simple and compound sentences. We get the following ChatGPT 5 Output:
{
"task": "Robots, move through the arena by activating the three switches in order. Start with the leftmost switch, then continue to the second, and finally the third. Once all switches are activated, proceed into the final room and head toward your assigned goal.",
"subtasks": {
"first": "Robots, navigate to the leftmost room, find the first switch, and activate it to open the way forward.",
"second": "Robots, head into the second room once it becomes accessible, locate the next switch, and press it to unlock the third chamber.",
"third": "Robots, advance into the third room, search for the last switch, and activate it to reveal the final passage.",
"goal": "Robots, after all switches are engaged, move into the final room and proceed directly to your goal."
}
}
The dataset is created by repeating the previous process for each task we want to represent.
Once we have the dataset, it's training time! We train one RNN to encode all the tasks in the dataset in a supervised fashion: the RNN has an internal state with 1024 hidden dimensions, enough for capturing all tasks. To be able to compare with a ground-truth 'state' label, this internal state is decoded into a one-hot encoding. As a result, we obtain a compact RNN that selects the correct automaton given a high-level task embedding, and correctly propagates its internal state as a function of the events.
8 step breakdown:
The RNN is ready to reason, but we need a policy that correctly uses the internal state of the RNN to execute actual robot actions in a collaborative way. To do so, we parameterize the policy as a graph neural network (GNN). There are two inputs to the GNN: (1) the observations of the neighboring robots, and (2) the internal state of the RNN.
How does the GNN learn to associate the internal state of the RNN with the correct "sub-task" to be solved? The answer is ... Multi-Agent Reinforcement Learning. We randomize MARL scenarios, of course, but we initialize each process by starting with a relevant internal state (obtained by feeding the scenario into the RNN as well). As a result, we obtain a compact decentralized policy, conditioned on the internal state of the RNN, that allows the robots to collaborate and solve each sub-task of each automaton encoded in the RNN!
The GNN, conditioned on the internal states reasoned by the RNN in mode "four room scenario", learns to coordinate to sequentially press all the switches and navigate to the individual goals.
The GNN, conditioned on the internal states reasoned by the RNN in mode "capture the flag", learns to find the requested colored flag, then go to the switch and then navigate to the individual goals.
We can execute nearly any task that can be represented as a DFA, and be broken down into (PO-)MDP subtasks.
Robots display high-level reasoning and collaboration: since the purple switch is pressed very quickly, the two leftmost robots understand that they do not need to press it again, thus improving overall efficiency.
The robots are effective in understanding the prompted sequence of flag colors ("red", "blue", "yellow", then "goals"), and benefit from communications to maximize efficiency in terms of travel time (the team switches when any robot hits a target).
The robots can react in real-time to changes in the task, like the flag being 'stolen' and returned to its original position. This is why the team goes back and forth before activating the switch at the center of the arena and navigate to their individual goals. (disclaimer, flags and goals are not displayed, but can be seen in the paper)
Real-time reactivity holds for any flag configuration, proving generalization across tasks and sequences of events. Thus, the RNN has correctly captured the reasoning capabilities of the automata (disclaimer, flags and goals are not displayed, but can be seen in the paper)
The modularity of our approach also allows us to handle significant disturbances, like a person displacing the robots various times. The reasoning is not disturbed, only the GNN has to be robust against the perturbations. (disclaimer, flags and goals are not displayed, but can be seen in the paper)
Overall, our policy seamlessly transfer to real robots, being a first in achieving onboard, real-time, decentralized multi-robot reasoning and collaboration---all from a natural language command.
See the full video here to follow the entire pipeline all at once.
Some cautionary notes for users, and hints to our upcoming work:
RNN models can exhibit long-term accumulated drift. Our hundreds of evaluations over sequences with tens of events have shown very limited drift, if at all. Most tasks we consider don't really have more than 10-20 sub-tasks and corresponding event transitions. In principle, the RNN is a design choice: we can just as well use an LSTM or a Transformer for a more stable representational capacity.
The deployment performance can still suffer the sim-to-real gap. Yes, our simulation-to-deploy pipeline is currently quite mature, and its control abstraction alleviates this gap substantially. In general, this may not be true of arbitrary robot platforms. We suggest either a fine-tuning of the final action layer, or using offline RL to learn actions from real-robot data. In any case, the modular structure of the graph policy (g) supports such drop-in replacements.
A large model (online) must be queried with all the tasks of interest. We automate the process of sub-task and task sequence generation, but, there is still the first step of feeding in a task and its scenario into an LLM, which a designer/operator must perform. We can automate this as well (upcoming!).