Thomas Tian, Masayoshi Tomizuka, Anca D. Dragan, and Andrea Bajcsy
University of California, Berkeley
Humans have internal models of robots (like their physical capabilities), the world (like what will happen next), and their tasks (like a preferred goal). However, human internal models are not always perfect: for example, it is easy to underestimate a robot’s inertia. Nevertheless, these models change and improve over time as humans gather more experience. Interestingly, robot actions influence what this experience is, and therefore influence how people’s internal models change. In this work we take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality.
A key challenge towards this is modeling how humans learn. We do this by modeling human learning as a nonlinear dynamical system that evolves as a function of new observations that the robot can influence, and inferring the approximate human learning dynamics from demonstrations that naturally exhibit human learning. Although the most general model learning problem remains computationally intractable, we introduce a tractable approximation that is readily solvable via gradient-based optimization, and is compatible with neural network representations of the human learning dynamics. Leveraging our approximate dynamics model of human learning, we formalize robot influence over the human’s internal model as a Markov Decision Process (MDP) where the human’s internal model is part of the state and the human’s learning dynamics are part of the transition function. The solution yields robot actions that change the human’s internal model by changing the human’s observations in a way that rewards the robot.
Algorithm
In our user study, we investigate if we can infer the dynamics of real human learning, and enable robots to influence real users. We focus on scenarios where the robot’s physical dynamics are different from what the human is used to. In this study, the human controls the end-effector of a 7DOF robot arm via hand gestures (see Figure 1). The robot's dynamics is corupted so that as the human interact with the robotic arm, they will naturally learn about the robot dynamics. We investigate if a robot can actively teach a human the physical dynamics and improve their teleoperation performance faster than if the human does the task on their own. In other words, we aim to understand if a robot can align the human’s internal model with the robot’s.
When participants passively learn on their own, their trajectories are consistently suboptimal, weaving around the optimal path.
In contrast, in the active teaching condition, the initial portion of the trajectory exhibits the robots teaching behavior: the robot intentionally exaggerates the dynamics bias to change the human’s internal model faster.
The right side figure shows how human action optimality distance varies over time with each robot strategy. It indicates a significant improvement in the human’s action optimality when the robot actively teaches them compared to when the human passively learns.
We want to test two aspects of our approach: our ability to infer the dynamics of human learning and the effectiveness of our robot influencing algorithm. To fully validate both, we need access to the ground-truth human learning dynamics. For this reason, we perform a series of simulation experiments with simulated humans (we simulated two types of human learners: gradient-based learners and threshold learners). We explore two shared autonomy contexts: a robot teaching a human about physics-based robot dynamics and a robot that implicitly influences human objectives, like their goal or motion preferences.
Teaching Physical Dynamics
Our method performs comparably to Oracle model, and is able to align the human’s internal model of the robot’s dynamics with the true dynamics significantly faster than Passive Learn or Random.
Interestingly, in all but one setting does the robot automatically stop teaching the human since the human’s internal model is sufficiently correct.
The one exception is in the Robot Arm Teleoperation environment with the threshold human. Since this human doesn’t learn when the gradient is too small, the robot must continue to exert effort to maximize its reward.
Implicitly Influencing Human Objectives
The Learning Assist robot knows that the human’s internal model (goal or preference) can be changed, it automatically exerts higher effort early on to align the human’s internal model with its own, resulting in less long-term assistance and lower task cost
Static Assist robot is not aware that the human can change their mind, and thus does not exert enough effort to influence the human’s internal model