Interactive Policy Shaping for Human-Robot Collaboration with Transparent Matrix Overlays

Jake Brawer, Debasmita Ghose, Kate Candon, Meiying Qin, Alessandro Roncone, Marynel Vázquez, Brian Scassellati

ACM/IEEE International Conference on Human-Robot Interaction 2023 Stockholm, SE

Paper

Supplementary Material

Code

Video

Abstract

One important aspect of effective human-robot collaborations is the ability for robots to adapt quickly to the needs of humans. While techniques like deep reinforcement learning have demonstrated success as sophisticated tools for learning robot policies, the fluency of human-robot collaborations is often limited by these policies' inability to integrate changes to a user's preferences for the task. To address these shortcomings, we propose a novel approach that can modify learned policies at execution time via symbolic if-this-then-that rules corresponding to a modular and superimposable set of low-level constraints on the robot's policy. These rules, which we call Transparent Matrix Overlays, function not only as succinct and explainable descriptions of the robot’s current strategy but also as an interface by which a human collaborator can easily alter a robot's policy via verbal commands. We demonstrate the efficacy of this approach on a series of proof-of-concept cooking tasks performed in simulation and on a physical robot.

Transparent Matrix Overlays

Transparent Matrix Overlays enable a user to issue high-level directives to a robot and resulting in potentially significant change to the robot's policy at execution time.

1. Prohibitory Overlays:

Prohibitory overlays (lines 9-10 of Alg.1) implement rules that prohibit actions in states that satisfy the conditions imposed by the overlay. These overlays down-weight the action probabilities as a function of rule satisfaction resulting in actions that satisfy the rule being suppressed.

2. Transfer Overlays:

Transfer overlays (lines 11-13 of Alg.1) transfer probability density from a specified source action to a specified target action. This is useful when the robot has learned to perform a task in a particular way, but some equivalent alternative is desired. For example, the robot could shift from physically performing a given task to guiding the user through the task via verbal instruction.

3. Permissive Overlays:

Permissive overlays (lines 14-15 of Alg.1) implement rules that permit actions in states that satisfy the conditions of the overlay. These overlays up-weight satisfactory actions, making them more likely for the agent.

Task and Experimental Setup

The goal of the physical and the simulated collaboration is for a person and a robot to prepare a breakfast meal comprised of a main dish (one of six variants of oatmeal or a bowl of cereal) and a side dish (one of five microwaveable food items) per the person's meal preferences.

We developed a simple language model that mapped templated commands to particular overlay rules in Prolog.

We used color masking techniques to determine the positions of different ingredients in the workspace, so they could be manipulated as required.

Results with a Physical Robot Utilizing Overlays

The first column of case studies, a), b), and c) depict the action sequence predicted by the base model. The second column shows the action sequence of the policy modified by the overlays and corrective actions, and the third column depicts each overlay's activation intervals.