A Gaze Model Improves Autonomous Driving

End-to-end behavioral cloning trained by human demonstration is now a popular approach for vision-based autonomous driving. A deep neural network maps drive-view images directly to steering commands. However, the images contain much task-irrelevant data. Humans attend to behaviorally relevant information using saccades directing the fovea to important areas. We demonstrate that behavioral cloning also benefits from active attention. We trained a generative deep neural network model that accurately predicts human gaze maps while driving in both familiar and unseen environments. We incorporated the predicted gaze maps into end-to-end networks for two behaviors: following and overtaking. Incorporating attention information significantly improves generalization to unseen environments. We hypothesize that incorporating attention enables the network to focus on task critical objects, which vary little between environments, and ignore irrelevant elements in the background, which vary greatly.