Initial Task Allocation for Multi-Human Multi-Robot Teams with Attention-based Deep Reinforcement Learning


Ruiqi Wang, Dezhong Zhao, and Byung-Cheol Min

 SMART Lab, Purdue University

[Project Website] [Video] [Paper]

Published at IROS2023 

Abstract

Multi-human multi-robot teams have great potential for complex and large-scale tasks through the collaboration of humans and robots with diverse capabilities and expertise. To efficiently operate such highly heterogeneous teams and maximize team performance timely, sophisticated initial task allocation strategies that consider individual differences across team members and tasks are required. While existing works have shown promising results in reallocating tasks based on agent state and performance, the neglect of the inherent heterogeneity of the team hinders their effectiveness in realistic scenarios. In this paper, we present a novel formulation of the initial task allocation problem in multi-human multi-robot teams as contextual multi-attribute decision-make process and propose an attention-based deep reinforcement learning approach. We introduce a cross-attribute attention module to encode the latent and complex dependencies of multiple attributes in the state representation. We conduct a case study in a massive threat surveillance scenario and demonstrate the strengths of our model.

Problem Formulation

We aim to explore the problem of initial task allocation (ITA) in MH-MR teams. Specifically, we seek to determine how to optimally assign a job consisting of a set of tasks, each with varying attributes, to a team of humans with diverse capabilities and robots with assorted characteristics at the outset. As illustrated in the figure below, we approach this problem by formulating it as a contextual multi-attribute decision-making process (CMADP). This involves making a decision by considering a context consisting of three main categories of attributes: human factors, robot characteristics, and task attributes, each of which includes various sub-attributes composed of individual variables. 

Framework of AtRL

Illustration of the proposed AtRL framework. The multi-attribute data inputs are firstly fed into recurrent embedding layers separately to generate three attribute sequences with the same dimension. Such sequences are concatenated to generate a low-level state representation, which is passed through cross-attribute attention layers with each attribute sequence respectively. In each cross-attribute attention layer, each attribute sequence is adapted with relative information revealed in the other two, by calculating adaptive dependencies between features of the current attribute and those encoded in the low-level state representation. Finally, the enhanced attribute sequences are then passed through a mean pooling layer to produce the high-level state presentation of the multi-attribute context, which is transported to a policy network to learn the value function and policy.

Case Study

We conducted a case study to apply and validate our proposed AtRL in a large-scale threat surveillance task scenario. More details can be found in the video below.