Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty
Ziqin Yuan*, Ruiqi Wang*, Taehyeon Kim, Dezhong Zhao, and Byung-Cheol Min
*:equal contribution
Presented at ICRA 2025
Ziqin Yuan*, Ruiqi Wang*, Taehyeon Kim, Dezhong Zhao, and Byung-Cheol Min
*:equal contribution
Presented at ICRA 2025
Abstract
Task allocation in multi-human multi-robot (MH-MR) teams presents significant challenges due to the inherent heterogeneity of team members, the dynamics of task execution, and the information uncertainty of operational states. Existing approaches often fail to address these challenges simultaneously, resulting in suboptimal performance. To tackle this, we propose ATA-HRL, an adaptive task allocation framework using hierarchical reinforcement learning (HRL), which incorporates initial task allocation (ITA) that leverages team heterogeneity and conditional task reallocation in response to dynamic operational states. Additionally, we introduce an auxiliary state representation learning task to manage information uncertainty and enhance task execution. Through an extensive case study in large-scale environmental monitoring tasks, we demonstrate the benefits of our approach.
Conceptual Illustration
Conceptual illustration of our adaptive task allocation method, named ATA-HRL, in MH-MR teams. Unlike previous one-sided approaches, we consider both inherent heterogeneity and in-process dynamic states of the team and its assigned tasks, hierarchically combining initial task allocation and conditional task reallocation. To handle state information uncertainty, we also introduce an auxiliary state learning task to contextually reconstruct incomplete or noisy state information.
Framework Overview
Illustration of the proposed ATA-HRL framework. The main HRL hierarchy consists of two levels: the first, at time step 0, determines the optimal ITA by considering inherent team heterogeneity; the second, at each subsequent time step 1 to n during operation, decides whether to reallocate tasks and how to allocate them, considering additional dynamic operational changes. The optional reallocation decision is represented by a switch icon. An auxiliary state learning module is integrated into the second layer to address state information uncertainty, enhancing decision-making during reallocation.
Case Study and Simulation Environment
Our case study focuses on a large-scale environmental surveillance mission where an MH-MR (Multi-Human Multi-Robot) team is deployed to monitor pollution hazards originating from warehouses and factories. The task scenario is set in a 2 km × 2 km simulated urban environment with various Points of Interest (POIs), representing potential pollution sources that must be investigated and classified based on their type and hazard level. This simulation captures the complexities of real-world operations, where MH-MR teams must dynamically adapt to evolving task requirements and team conditions.
Human and Robot Models:
The MH-MR team consists of two types of robots, UAVs and UGVs, with different mobility, autonomy, and sensory abilities. UAVs have higher speed but lower default image quality for ground observations, while UGVs excel in image quality but are slower. The robots’ performance can be enhanced through shared control by human operators, which significantly improves navigation precision and imaging results.
Human operators are modeled as sequential decision-makers with varied skill levels, cognitive abilities, and states such as fatigue and engagement. These states dynamically influence their performance, particularly during high workload periods. The human model accounts for decision-making delays and classification errors based on fatigue and engagement levels, which can change as the mission progresses.
Task Description:
The mission begins when a satellite system detects POIs, which are classified into two types: ground pollution (warehouses) and air pollution (factories). Each POI has a difficulty level—low, medium, or high—that influences the task’s complexity and the resources required for effective monitoring. The MH-MR team is tasked with:
1. Navigating to each POI: Robots can perform this autonomously or under collaborative control, where human operators assist with navigation and provide guidance to enhance imaging accuracy.
2. Capturing Images: The quality of the images depends on the robot’s capabilities and the control mode. UGVs generally produce better quality images for ground pollution, while UAVs are faster and more effective for air pollution.
3. Classifying Hazards: The captured images are analyzed by human operators or onboard AI systems to identify potential hazards. Each correct classification awards points (15, 25, or 35 points for low, medium, and high difficulty levels, respectively), while incorrect classifications result in point deductions of the same amounts.
Dynamic and Uncertainty Elements:
To closely mimic real-world scenarios, the environment incorporates dynamic elements and uncertainties:
• Random Task Changes: Task attributes, such as POI complexity and type, may change during the mission due to updates in hazard evaluations. These changes represent uncertainties in task estimation, requiring the team to adapt their strategies dynamically.
• Robot Failures: Robots may experience random failures, simulating equipment malfunctions that remove them from the task pool temporarily or permanently, impacting overall team performance.
• Latency and Noise: Latency is introduced into robot states (e.g., location and idleness) to simulate delays in status updates due to communication issues or field conditions. Human fatigue estimates are affected by Gaussian noise, reflecting the inherent difficulty in accurately measuring cognitive states in real-time.
Simulation Environment:
The simulation environment is carefully designed to replicate urban and industrial settings with various POIs strategically distributed across the map. Each POI type (ground or air pollution) is visually distinct—warehouses represent ground pollution, and factories indicate air pollution. The robots must navigate complex terrain and structures, taking into account obstacles that can affect their speed and sensor performance. The environment’s scale challenges the team’s ability to efficiently allocate tasks and manage resources.
Simulation Goals:
The primary goal of this environment is to evaluate the effectiveness of our ATA-HRL framework in managing task allocation under varying conditions of team heterogeneity and operational uncertainty. By continuously adapting task assignments and reallocations, the MH-MR team aims to maximize task performance scores, effectively leveraging the strengths of both humans and robots while mitigating the impact of dynamic state changes and uncertainties.
The case study setting provides a challenging testbed that highlights the necessity of sophisticated task allocation methods capable of dynamic adaptation and resilience to real-world operational constraints. This simulation environment not only tests the immediate performance of ATA-HRL but also its ability to generalize across different scales of MH-MR operations and varying levels of information uncertainty.
Implementation Details
Our ATA-HRL framework was trained and tested under various settings to validate its effectiveness in task allocation. The training utilized a 3-layer GRU network, each with 128 hidden units, designed to handle temporal dependencies and reconstruct state information in the presence of latency and noise. For state representation learning, we implemented a Conditional VAE with an encoder and decoder, each consisting of 2 fully connected layers with ReLU activation. The KL divergence weight was set to 0.1, ensuring a balance between reconstruction accuracy and regularization.
We used PPO (Proximal Policy Optimization) for reinforcement learning, with specific hyperparameters tuned for our task environment: a learning rate of 0.0003, clip range of 0.2, and a batch size of 36. An entropy coefficient of 0.01 was applied to encourage exploration. We deployed 20 behavioral actors to collect interaction experiences in parallel. Training was conducted over 10,000 episodes, with early stopping triggered if performance plateaued for 200 consecutive episodes. The ITA layer was implemented as described in the original AtRL paper.
The GRU layers of the ATA-HRL model were trained with a learning rate of 0.001 using the Adam optimizer, while the auxiliary task for state representation was trained concurrently with the main policy, ensuring coherence between state reconstructions and policy adaptations. Our ablation studies confirmed the importance of both the auxiliary task and conditional reallocation in improving overall performance, particularly under conditions of uncertainty.
Experimental Results