Since the rise of Deep Learning methods in the automotive field, multiple initiatives have been collecting datasets in order to train neural networks on different levels of autonomous driving. This requires collecting relevant data and precisely annotating objects, which should represent uniformly distributed features for each specific use case. In this paper, we analyze several large-scale autonomous driving datasets with 2D and 3D annotations in regard to their statistics of appearance and their suitability for training robust object detection neural networks. We discovered that despite spending huge effort on driving hundreds of hours in different regions of the world, merely any focus is spent on analyzing the quality of the collected data, from an operational domain perspective. The analysis of safety-relevant aspects of autonomous driving functions, in particular trajectory planning with relation to time-to-collision feature, showed that most datasets lack annotated objects at further distances and that the distributions of bounding boxes and object positions are unbalanced. Therefore we propose a set of rules which help find objects or scenes with inconsistent annotation styles. Lastly, we questioned the relevance of mean Average Precision (mAP) without relation to the object size or distance.
In these graphs we show the histogram of distances of cars in individual datasets and the thresholds for safe distance when braking. We can see that the the datasets do not usually have a high representation of distant cars, which is important in scenarios when the ego-car is moving fast (like on a high-way).