ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Mayank Bansal, Alex Krizhevsky, Abhijit Ogale

Blog Post: https://medium.com/waymo/learning-to-drive-beyond-pure-imitation-465499f8bcb2

RSS 2019 Paper: http://www.roboticsproceedings.org/rss15/p31.html

Talk at Google I/O: https://www.youtube.com/watch?v=mxqdVO462HU

A. System Architecture

The figure below shows a system level overview of how the neural net is used within the self-driving system. At each time, the updated state of our agent and the environment is obtained via a perception system that processes sensory output from the real-world or from a simulation environment as the case may be. The intended route is obtained from the router, and is updated dynamically conditioned on whether our agent was able to execute past intents or not. The environment information is rendered into the input images and given to the RNN which then outputs a future trajectory. This is fed to a controls optimizer that outputs the low-level control signals that drive the vehicle (in the real world or in simulation).


The results on this page depict the ChauffeurNet agent driving in a closed-loop control environment. The teal path depicts the input route, yellow boxes with the faded trail are the positions of the dynamic objects in the scene over the past 1 second, green box is the agent, blue dots are the agent’s past positions and green dots are the predicted future positions which are used by the controller to drive the agent forward.

B. Input Ablation Test Results

With Stop Signs Rendered

No Stop Signs Rendered

With Perception Boxes Rendered

No Perception Boxes Rendered

C. Model Ablation Test Results

Nudging around a Parked Car

M0 = Imitation with Past Dropout

M1 = M0 + Traj Perturbation

M2 = M1 + Environment Losses

M4 = M2 + Imitation Dropout

Recovering from a Trajectory Perturbation

M0 = Imitation with Past Dropout

M1 = M0 + Traj Perturbation

M2 = M1 + Environment Losses

M4 = M2 + Imitation Dropout

Slowing down for a Slow Car

M0 = Imitation with Past Dropout

M1 = M0 + Traj Perturbation

M2 = M1 + Environment Losses

M4 = M2 + Imitation Dropout

D. Real World Driving with model M4

Lane Curve Following

Stop Sign & Turn

Stop Sign

Stop Sign & Turn

E. Closed-loop Driving with model M4 on Logged Data in Simulation

Stop Signs and Narrow Streets with Parked Vehicles

Traffic Lights

In the example above, the ChauffeurNet agent stops for a traffic light transitioning from yellow to red (note the change in intensity of the traffic light rendering which is shown as the curves along the lane centers) instead of blindly following behind other vehicles.

Stop-and-Go behind other vehicles

F. Trajectory Prediction for other dynamic objects on Logged Data

The examples below demonstrate predictions from PerceptionRNN on logged data. Recall that PerceptionRNN predicts the future motion of other dynamic objects. The red trails indicate the past trajectories of the dynamic objects in the scene. The green trails indicate the predicted trajectories, 2 seconds into the future, for each object.

G. Prediction and Loss Visualization

Visualization of predictions and loss functions on an example input. The first row is at the input resolution, while the second row shows a zoomed-in view around the current agent location.

Flattened Inputs

Target Road Mask

Predicted Road Mask Logits

Predicted Vehicle Logits

Agent Pose Logits

Collision Loss

On Road Loss

Geometry Loss

H. Sampling Speed Profiles

log P_1(x, y)

log P_5(x, y)

The waypoint prediction from the model at timestep k is represented by the probability distribution P_k(x,y) over the spatial domain in the top-down coordinate system. In this paper, we pick the mode of this distribution p_k to update the memory of the AgentRNN.

More generally, we can also sample from this distribution to allow us to predict trajectories with different speed profiles. The figure on the left illustrates the predictions P_1(x,y) and P_5(x,y) at the first and the fifth iterations respectively, for a training example where the past motion history has been dropped out. Correspondingly, P_1(x,y) has a high uncertainity along the longitudinal position and allows us to pick from a range of speed samples. Once we pick a specific sample, the ensuing waypoints get constrained in their ability to pick different speeds and this shows as a centered distribution at the P_5(x,y).

The use of a probability distribution over the next waypoint also presents the interesting possibility of constraining the model predictions at inference time to respect hard constraints. For example, such constrained sampling may provide a way to ensure that any trajectories we generate strictly obey legal restrictions such as speed limits. One could also constrain sampling of trajectories to a designated region, such as a region around a given reference trajectory.

I. Open Loop Evaluation on Perturbation Data

Trajectory Perturbation

[Original] A logged training example where the agent is driving along the center of the lane.

[Perturbed] The perturbed example created by perturbing the current agent location (red point) in the original example away from the lane center and then fitting a new smooth trajectory that brings the agent back to the original target location along the lane center.

We also compare the performance of models M0 and M1 on our perturbed evaluation data w.r.t the L2 distance metric, and this is shown in the figure on the left. Note that the model trained without including perturbed data (M0) has larger errors due to its inability to bring the agent back from the perturbation onto its original trajectory.

The figure below shows examples of the trajectories predicted by these models on a few representative examples showcasing that the perturbed data is critical to avoiding the veering-off tendency of the model trained without such data.

Ground-truth

Model M0 Prediction

Model M1 Prediction

Comparison of ground-truth trajectory in the first column with the predicted trajectories from models M0 and M1 in the second and third columns respectively on two perturbed examples. The red point is the reference pose (u_0, v_0), white points are the past poses and green points are the future poses.