Imitation Learning with Human Eye Gaze via Multi-Objective Prediction

Abstract: Approaches for teaching learning agents via human demonstrations  have been widely studied and successfully applied to multiple domains. However, the majority of imitation learning work utilizes only behavioral information from the demonstrator, i.e. which actions were taken, and ignores other useful information.  In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating visual attention, and holds the potential to improve agent performance and generalization. In this work, we propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware, imitation learning architecture that learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context. We apply GRIL to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a photorealistic simulated environment.  We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data. The code can be found at https://github.com/ravikt/gril

Flight Experiments

static_videos.mp4

This is a sample rollout of GRIL and the baseline models in a test starting location for the stationary truck task.

Trajectory Comparisons

These are evaluation rollout trajectories for GRIL and baseline comparisons in the stationary task. For one of the evaluation starting locations, we depict five evaluation rollouts corresponding to each method. The task consists of navigating from the starting location (purple dot) to the target vehicle (yellow dot).


Target Following

moving_videos.mp4

During these evaluation rollouts, GRIL, BC, and BC-CGL demonstrated a remarkable capability to generalize to the new task despite not having been previously trained on it, often successfully searching for and following the moving target.