REBEL: Rule-based and Experience-enhanced LLMs for Initial Task Allocation in Multi-Human Multi-Robot Teams
Anonymous Author(s)
Submitted to 2026 IEEE International Conference on Robotics & Automation
Abstract
Multi-human multi-robot teams are increasingly recognized for their efficiency in executing large-scale, complex tasks by integrating heterogeneous yet potentially synergistic humans and robots. However, this inherent heterogeneity presents significant challenges in teaming, necessitating efficient initial task allocation (ITA) strategies that optimally form complementary human-robot pairs or collaborative chains and establish well-matched task distributions. While current learning-based methods demonstrate promising performance, they often incur high computational costs and lack the flexibility to incorporate user preferences in multi-objective optimization (MOO) or adapt to last-minute changes in dynamic real-world environments. To address these limitations, we propose REBEL, an LLM-based ITA framework that integrates rule-based and experience-enhanced learning to enhance LLM reasoning capabilities and improve in-context adaptability to MOO and situational changes. Extensive experiments validate the effectiveness of REBEL in both single-objective and multi-objective scenarios, demonstrating superior alignment with user preferences and enhanced situational awareness to handle unexpected team composition changes. Additionally, we show that REBEL can complement pre-trained ITA policies, further boosting situational adaptability and overall team performance.
Conceptual Illustration
Conceptual illustration of the proposed LLM-based REBEL framework for ITA in MH-MR teams. Given a Multi-Attribute Observation that reflects the heterogeneity of the MH-MR team and its assigned tasks, as well as user preferences for multi-objective optimization and last-minute team composition changes, the LLM agent generates a tailored ITA plan. The LLM agent also retrieves the most relevant guidance rules and prior experiences collected from previous interactions to enhance its ITA.
Framework Overview
Illustration of the three stages in the proposed LLM-based REBEL framework for ITA in MH-MR teams. The first two stages comprise the Knowledge Acquisition phase, in which the LLM creates ITA plans for different randomized missions and generates a collection of learned rules and experience data through simulation. During the Inferencing stage, the Experience Retrieval and Rule Retrieval modules extract rules and experiences most relevant to the user's input to enhance the quality of its ITA plan.
Sample LLM Inputs + Outputs for Stages 1 and 2
The prompt provided to the LLM consists of the Background (orange), Goal (red), and Mission Objectives (green).
The generated rules provide helpful information for the LLM to reference when performing ITA.
The prompt provided to the LLM consists of the Mission Scenario (orange), Goal (red), Mission Objectives (green), and Relevant Objective-Specific Rules (blue) from Stage 1.
The generated ITA Plan specifies which POI each robot has been assigned to visit and each agent's POI classification assignments. The simulation outputs performance metrics that include the Mission Reward, POI Classification Accuracy, Mission Time, and Human Utilization.
Case Study and Simulation Environment
We evaluate REBEL in a simulated case study that reflects a real-world environmental monitoring application of MH-MR teams. The MH-MR team is deployed to assess the pollution levels from various warehouses (i.e., Points of Interest—POIs). Specifically, the team must classify all warehouses as hazardous or non-hazardous as accurately as possible.
As shown in the figure below, the team consists of multiple human operators, UAVs, and UGVs. The mission consists of two tasks. First, robots must navigate autonomously or be teleoperated by human operators to each warehouse and take pictures. Second, the images must be analyzed by human operators or robots to determine if the warehouses are hazardous.
The mission takes place in a 2km x 2km outdoor environment with multiple warehouses scattered throughout. Hazardous warehouses are distinguished by the presence of red smoke. Additionally, the different building colors indicate the difficulty of identifying the warehouses as hazardous. Various factors such as warehouse size and type determine this difficulty level in the real world.
During the mission, human operators may teleoperate robots to warehouses or perform hazard classification on the captured images. These operators are uniquely characterized by their skill levels and cognitive abilities, which directly impact the probability of accurate warehouse classification.
UAVs and UGVs may be assigned to visit warehouses or analyze captured images. UAVs are characterized by their high speed and low image quality, whereas UGVs have lower speed but higher image quality. As shown in the image on the right-hand side, when in shared-control mode, the skill level of the human operator tele-operating the robot also directly impacts its speed.
Experiment Results
All experiments were conducted using the OpenAI GPT-4o LLM.
REBEL outperforms or is within 10% of AtRL, the SOTA Reinforcement Learning (RL) approach.
REBEL outperforms all other baselines by generating ITA plans that accurately reflect user preferences by achieving the highest score for the objective the user prioritized.
REBEL's flexibility enables functionality as a standalone framework and a test-time adaptation module to enhance pre-trained SOTA RL-based ITA policies.