This is accompanying website for paper by Błażej Osiński, Adam Jakubowski, Piotr Miłoś, Paweł Zięcina, Christopher Galias, Silviu Homoceanu, and Henryk Michalewski
The preprint of the article can be accessed here: https://arxiv.org/abs/1911.12905
It is to appear at ICRA 2020 and was presented at NeurIPS 2019 Workshop: Machine Learning for Autonomous Driving
We have recreated a real-world urban space as two new CARLA maps which approximately reflect the testing grounds for real-world deployments.
Below we present preview of the custom made level. The driving was done by a human to showcase the custom-made level.
During training we periodically save videos of trajectories with some additional information. In this section we share multiple videos showcasing different weathers and CARLA levels. Those videos were not edited and were taken from training jobs as-is.
In each of the videos below the panes represent the following:
In this experiment we provided policy with last action as additional input. Due to inertia of the environment policy learned to use last action to switch between two modes controlling car in a pulse width modulation like manner.
Before we build our custom CARLA level that mimic real-world environment we were training policies only on two CARLA built-in levels.
Those two CARLA built-levels include only double-lines road markings. As seen on saliency maps, policy that only saw double-line road markings is not sensitive to single-line road markings.
Our reward functions includes a term that penalizes for not sticking to the center
of a lane. In our initial implementation distance used for calculating the penalty
was using all X, Y and Z spatial coordinates.
Due to technical reasons our list of lane-center positions was actually
placed above the road in the Z axis. This resulted in a policy that drives
with two right side wheels placed on a high curb so its elevation is increased and distance to the center-line point above the ground is decreased.
The fix was to calculate penalty using only X and Y coordinates.
Summary of experiments with baselines across nine scenarios. The columns to the right show the mean and max of autonomy (the percentage of distance driven autonomously). Models are sorted according to their mean performance.
Summary of experiments across 9 scenarios with baselines. Each subfigure represents performance for a given deployment scenario.
Average deviation of models from expert trajectories. Measurements based on GPS.
We have analysed 25 outliers with results significantly below average. In this group we have identified 3 cases of human errors - a wrong chauffeur command was given to the autonomous system (e.g. "turn righ" instead of "lane follow"). Other recurring mistake concerned attempts to drive on a sidewalk - these attempts were present mostly in two overpass scenarios and in the scenario factory_city-sud_strasse_u_turn. All attempts to drive on a sidewalk were stopped by the driver. We are planning to precisely identify the reason for "sidewalk driving" in the next stage of this project.
Models DISCRETE-REG and CONTINUOUS-PLAIN drive in a competent way in most tested situations. These models showed less confident behaviour when confronted with a juncture with multiple exits. In such situations they usually decided for a correct driving direction, but the magnitude of turns quite often required a correction.
DISCRETE-PLAIN and other discrete models tended to wobble. Wobbling was relatively soft, meaning that models tended to softly turn from an extreme left of the lane to the extreme right of the road and back. For safety reasons we had to correct this behaviour.
To compute the metric we again process frame by frame human reference drive and compare human action and output of the evaluated model. We classify requested steering wheel angle into one of three buckets: left, straight or right, if it is respectively less than -0.02 radian, between -0.02 and 0.02 radian or greater than 0.02 radian.
For each of the buckets, we compute a F1 score between human reference action and the model output. The average of these three values is the final average F1 score.
As one can see in the accompanying figure, this metric also seems to correlate with the model's real-world performance.