Initial Task Assignment in Multi-human Multi-Robot Teams: An Attention-Enhanced Hierarchical Reinforcement Learning Approach
Ruiqi Wang, Dezhong Zhao, Arjun Gupte, and Byung-Cheol Min
[Project Website] [Video] [Paper] [Code]
Abstract
Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task assignment (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning approaches show encouraging results, they might fall short in addressing the nuances of long-horizon ITA problems, particularly in settings with large-scale MH-MR teams or multifaceted tasks. To bridge this gap, we propose an attention-enhanced hierarchical reinforcement learning approach that decomposes the complex ITA problem into structured sub-problems, facilitating more efficient allocations. To bolster sub-policy learning, we introduce a hierarchical cross-attribute attention (HCA) mechanism, encouraging each sub-policy within the hierarchy to discern and leverage the specific nuances in the state space that are crucial for its respective decision-making phase. Through an extensive environmental surveillance case study, we demonstrate the benefits of our model and the HCA inside.
Problem Formulation
Hierarchical reinforcement learning aims to break down a complicated and large RL task into a hierarchy of simpler sub-tasks. Each of these sub-tasks is then addressed by training a sub-policy using conventional RL techniques. Option framework is a seminal structure in HRL. It introduces the concept of temporally extended actions, or options, which can be invoked at different decision epochs by a predetermined higher-level policy. Incorporating the option framework, we present the hierarchical contextual multi-attribute decision-making process (HCMADP) formulated for the initial task assignment problem within an MH-MR team. As depicted below, this involves devising a hierarchy of sub-allocation decisions, termed allocation options, in response to a multi-attribute context that represents the intrinsic heterogeneity of the team and the unique specifications of tasks.
Framework of AtRL
An illustration of the proposed AeHRL framework with an example of a two-level option hierarchy. It takes a multi-attribute context, which captures the heterogeneity of the MH-MR team and tasks, and learns a hierarchy of optimal initial allocation options as the output. The hierarchical execution of options is illustrated by the switch over contacts.
Case Study Scenario and Video
We design a case study to assess the performance of our proposed AeHRL in the scenario of an extensive environmental surveillance task, mirroring real-world military or disaster recovery situations.
Experimental Details
The setting of the hyper-parameter of our model during experiments is listed as below
Example Learning Curves of Best Performing AeHRL Variant Compared to
Best Performing Ablation Models, AtHRL and HRL Across Three Experimental Settings