Stationary object detection dataset for autonomous driving in Pilsen

Output Description

In order to develop reliable software components for autonomous driving, we have curated a novel dataset targeted primarily for performance evaluation. The dataset has two subsets targeted for two different use cases, e.g. Tram Railway segmentation and Traffic Sign Detection/Identification.

The raw data were collected privately or provided by Plzeňské Městské dopravní podniky (PMDP).

Tram Railway Segmentation

Railway localization is important for predicting collisions with dynamic scene objects, e.g., vehicles and pedestrians. To test the obstacle detection ability, we first have to know what is the future trajectory of a railway vehicle. With the help of PMDP (Plzeňské Městské dopravní Podniky) we have collected sets of videos from multiple trams and recorded them in various weather conditions, e.g., sunny, cloudy, fog, snow, heavy rain etc.

So far, we have annotated around 10,000 images with binary masks that represent the area between tram rails (see Figure 1). The images are sub-sampled and filtered based on pixel similarity. Around 90% of images are not annotated as were captured when the tram stopped or there were just minimal changes in the scene.

The images originate from 3 existing tram routes (1, 2 and 4) in the city of Pilsen. 95% of images contain just one possible "way". The remaining 5% of images have two paths, i.e., railroads. All annotated images were recorded during daytime; 65% of images were recorded during sunny weather and 35% during rainy weather.

Figure1: Selected samples from the dataset that illustrate different roads, railway targets, and weather conditions. Red area are the ground-truth targets created by annotators.

Traffic Sign Detection

Existing datasets for Traffic Sign Detection allow reliable evaluation of standard methods for traffic sign classification and detection. However, those datasets (GTSRB, STSD, and Tsinghua-Tencent) usually do not directly reflect what is needed for autonomous driving, i.e., detecting traffic signs as individual objects in time-dependent sequences of images.

In order to allow such evaluation, we had to create a novel dataset. So far, we have annotated around 50 video segments that originate from a recorded bright sunny day drive that took 1h. The route was tailored to cover the city centre, countryside, and high and low traffic. Around 15 minutes (7,200 images) do not include traffic signs to allow proper False Positive Rate evaluation. As the main goal is to predict each individual traffic sign just once, we include an appropriate GPS location for each traffic sign. With that, we allow testing Sensitivity and Specificity on a higher level than usual.

To allow testing robustness towards different weather conditions, we will include videos recorded under various weather conditions in Q2-2023.

Experimenting with Stable Diffusion

Image synthesis became an important field of study with various use cases, such as data augmentation, data manipulation, adversarial training, text-to-image, and image-to-image translation. Dataset extension (e.g., text-to-image translation - where different weather and lighting conditions are extracted and applied to the original dataset) can be especially useful for the automotive industry. The latest text-to-image synthesis, using stable Diffusion, showed exceptional results in terms of image fidelity. With a combination of a Large Language Model, we are now able to generate images conditioned by specific prompts. That opens the door to a domain-specific data-extension methodology in the way that synthesis of missing corner cases or user-definable scenarios should become a standard approach.

Page updated

Google Sites

Report abuse