Qualitative analysis

We recommend to watch the evaluation videos

In experiments we tested three different input modes: bird's-eye view, visual input, Lidar & front camera (along with some ablations). We also tested the influence of alternative reward schemes: sparse reward and no failure penalty.

Bird's-eye view

Front area only

No centerline

Framestack

Visual input

Front camera only

LIDAR & Front camera

Sparse reward

No failure penalty

Qualitative analysis

NGSIM (freeways)

We reviewed the number of cases (including successful and unsuccessful). Below we list some observations, which potentially can lead to further research. Perhaps it should be emphasised that RL trained policies exhibit different behaviour than the original (human) drivers. On the one hand, this can be view as an advantage (RL polices are 'faster') but at the same time raises concerns wether such polices could be safely and smoothly blended with humans and how they would behave in the multi-agent setting. Our policies serve mostly as baselines, and these conclusions are rather hypotheses to stimulate further research.

Other observations:

Our agent tends to significantly slow down after a successful lane change - the explanation for such behavior can be insufficient reward stimulation for keeping its velocity when episode is expected to end soon.
We observed failures when the agent tried to squeeze in a small gap between two car though sometimes we also observed a more correct behavior: the agent overtaking the two cars before attempting a lane change maneuver
There are cases of 'seemingly avoidable failures'; we observe that sometimes the agent simply ignores the existence of other participants. We speculate that more training and hyperparameter tuning would improve the situation.
In a few videos, one can observe a 'human-like' behaviour, agents initiates a maneuver but shortly after abandons its execution having 'realised' that there is not enough space. In some cases after a while it makes a second attempt.

Although "No centerline" does not perform significantly worse than the baseline experiment, the number of failed episodes because of 'wobbling'
As expected., "Framestack" is characterized by smoother steering and accelerations

openDD (roundabouts)

The agent's driving style seems to be more 'assertive' than the reference human driver from the dataset. It sometimes enters a roundabout without yielding the right of way to a car that is already on a roundabout ring. In some cases, this creates a situation when the agent enters a gap other than the original human reference driver. Sometimes this gap closes 'crashing' the agent. This is an undesirable behavior due to the fact that other participants are non-reactive. Fortunately, we observed that it is rather rare.

Besides that, it almost always accelerates a few meters before the roundabout exit - trying to minimize episode duration (and risk of potential failure).