Evaluation of Adaptive Systems for Human-Autonomy Teaming (EASyHAT)

Human-Autonomy Teaming (HAT) describes situations where people cooperate with artificially intelligent autonomous agents to perform some function. In a general sense, we can envision heterogeneous teams composed of autonomous participants each using either human or artificial intelligence. These relationships can take on different structures depending on the level of supervision the humans can exert and the level of intelligence and autonomy provided by the non-human agents.

In this third in a series of IJCAI workshops on Human-Autonomy Teaming, we intend to delve further into the problem of evaluation. Software engineering tools and techniques for evaluating the quality of traditionally developed software capabilities are well studied and many approaches are mature. System components developed through learning from data are less easily tested for mission critical systems. Reserving a portion of the data sample for testing provides a statistical measure of quality relative to the overall sample of the particular type of data for a particular component of what may be one in a series of learned models in a complex system. The quality of the overall system as it relates to a complex mission are less well known. Add to the problem the desire to allow models to adapt while deployed in lifelong learning approaches and complexity is further increased. Humans bring another element of quality assessment to the problem. Their interaction with the adaptive artificially intelligent components must also be evaluated in some way. Humans will adapt to the changing behaviors of the autonomy in ways that are currently unpredictable.

Some of the topics of interest include, but are not limited to the following:

  • The utilization of formal methods in conjunction with learned statistical models.
  • Bias in both laboratory and lifelong learning
  • Preventing damage from adversarial examples
  • Human adaptation to changing artificial intelligence
  • Complex systems using multiple learned models
  • Effective use of simulation and real-world testing
  • Effects of computational time limits in the human-autonomy team
  • Effects of communication between teammates on evaluation
  • Evaluating transfer and incremental learning
  • Effects of trust and transparency