REBEL: Rule-based and Experience-enhanced Learning with LLMs for Initial Task Allocation in Multi-Human Multi-Robot Teams
Arjun Gupte*, Ruiqi Wang*, Vishnunandan L.N. Venkatesh, Taehyeon Kim, Dezhong Zhao, and Byung-Cheol Min
*: Equal Contribution
SMART Lab, Purdue University
#: To Be Released
Submitted to 2025 IEEE International Conference on Robotics and Automation (ICRA) in Atlanta, Georgia
Abstract
Multi-human multi-robot teams combine the complementary strengths of humans and robots to tackle complex tasks across diverse applications. However, the inherent heterogeneity of these teams presents significant challenges in initial task allocation (ITA), which involves assigning the most suitable tasks to each team member based on their individual capabilities before task execution. While current learning-based methods have shown promising results, they are often computationally expensive to train, and lack the flexibility to incorporate user preferences in multi-objective optimization and adapt to last-minute changes in real-world dynamic environments. To address these issues, we propose REBEL, an LLM-based ITA framework that integrates rule-based and experience-enhanced learning. By leveraging Retrieval-Augmented Generation, REBEL dynamically retrieves relevant rules and past experiences, enhancing reasoning efficiency. Additionally, REBEL can complement pre-trained RL-based ITA policies, improving situational awareness and overall team performance. Extensive experiments validate the effectiveness of our approach across various settings.
Conceptual Illustration
Conceptual illustration of the proposed LLM-based REBEL framework for ITA in MH-MR teams. Given a Multi-Attribute Context that reflects the heterogeneity of the MH-MR team and its assigned tasks, as well as optional user preferences for multi-objective optimization and last-minute team composition changes, the LLM agent generates an adaptive ITA plan. The LLM agent can also retrieve the most relevant guidance rules and prior experiences collected from previous interactions.
Framework Overview
Illustration of the three stages in the proposed LLM-based REBEL framework for ITA in MH-MR teams. The first two stages comprise the Knowledge Acquisition phase in which the LLM creates ITA strategies for different randomized missions and generates a collection of learned rules and experience data through simulation. During the Inferencing stage, the LLM leverages Rule-based Learning and Experience RAG modules to extract rules and experiences most relevant to the user's input to enhance the quality of its ITA plan.
Sample LLM Inputs + Outputs for Stages 1 and 2
LLM Prompt During Rule Generation
The prompt provided to the LLM consists of the Mission Description (orange), Goal (red), and User Preferences (green).
LLM-Generated Rules
The generated rules provide helpful information for the LLM to reference when performing ITA.
LLM Prompt During Experience Generation
The prompt provided to the LLM consists of the Mission Description (orange), Goal (red), User Preferences (green), and Relevant Objective-Specific Rules (blue) from Stage 1.
LLM-Generated ITA Plan and Post-Simulation Metrics
The generated ITA Plan specifies which POI each robot has been assigned to visit and each agent's POI classification assignments. The simulation outputs performance metrics that include the Mission Reward, POI Classification Accuracy, Mission Time, and Human Utilization.
Case Study and Simulation Environment
We evaluate REBEL in a simulated case study that reflects a real-world environmental monitoring application of MH-MR teams. The MH-MR team is deployed to assess the pollution levels from various warehouses (i.e., Points of Interest—POIs). Specifically, the team must classify all warehouses as hazardous or non-hazardous as accurately as possible.
As shown in the figure below, the team consists of multiple human operators, UAVs, and UGVs. The mission consists of two tasks. First, robots must navigate autonomously or be teleoperated by human operators to each warehouse and take pictures. Second, the images must be analyzed by human operators or robots to determine if the warehouses are hazardous.
Mission Specifications
The mission takes place in a 2km x 2km outdoor environment with multiple warehouses scattered throughout. Hazardous warehouses are distinguished by the presence of red smoke. Additionally, the different building colors indicate the difficulty of identifying the warehouses as hazardous. Various factors such as warehouse size and type determine this difficulty level in the real world.
Human and Robot Models
During the mission, human operators may teleoperate robots to warehouses or perform hazard classification on the captured images. These operators are uniquely characterized by their skill levels and cognitive abilities, which directly impact the probability of accurate warehouse classification.
UAVs and UGVs may be assigned to visit warehouses or analyze captured images. UAVs are characterized by their high speed and low image quality, whereas UGVs have lower speed but higher image quality. As shown in the image on the right-hand side, when in shared-control mode, the skill level of the human operator tele-operating the robot also directly impacts its speed.
Experiment Results
All experiments were conducted using the OpenAI GPT-4o LLM.
Inferencing - Single Objective Optimization Setting
REBEL outperforms or is within 10% of AtRL, the SOTA Reinforcement Learning (RL) approach.
Inferencing - Multi-Objective Optimization Setting
REBEL's ITA plans accurately reflect user preferences by achieving the highest score for the objective that the user prioritized.
Inferencing - Situational Awareness Setting
REBEL's flexibility enables functionality as a standalone framework and a dynamic inference-time extension to further enhance pre-trained RL-based ITA policies.