Crash To Not Crash:

Learn to Identify Dangerous Vehicles using a Simulator

School of Electrical Engineering, KAIST

* authors contributed equally



Developing a computer vision-based algorithm for identifying dangerous vehicles requires a large amount of labeled accident data, which is difficult to collect in the real world. To tackle this challenge, we first develop a synthetic data generator built on top of a driving simulator. We then observe that the synthetic labels that are generated based on simulation results are very noisy, resulting in poor classification performance. In order to improve the quality of synthetic labels, we propose a new label adaptation technique that first extracts internal states of vehicles from the underlying driving simulator, and then refines labels by predicting future paths of vehicles based on a well-studied motion model. Via real-data experiments, we show that our dangerous vehicle classifier can reduce the missed detection rate by at least 18.5% compared with those trained with real data when time-to-collision is between 1.6 s and 1.8 s.

Demonstration Video


Overview of GTACrash & YouTubeCrash

Our synthetic dataset GTACrash is collected from a video game named Grand Theft Auto V (GTA V). It consists of 3661 non-accident scenes and 7720 accident scenes where each scene is made of 20 frames of images. Total number of positive samples (dangerous vehicles) is 128437 and total number of negative samples is 623173. We provide sample code for reading the image and the label to visualize each frame. Please note that this dataset is for research and educational use only.

Our test dataset YouTubeCrash is collected from dashcam videos uploaded on a YouTube channel called Car Crashes Time. It consists of 122 clips of video and each clip is then divided into a pair of accident and non-accident scene; each scene has 20 frames of images. Please note that this dataset is for research and educational use only.

Label File Structure of GTACrash Dataset


Sample code for utilizing this dataset is shown here.

Release Log

11/06/2018 - Initial data release


Please cite our work if you use the code or data from this site.

author = {Hoon Kim, Kangwook Lee, Gyeongjo Hwang, Changho Suh},
title = {Crash {T}o {N}ot {C}rash: {L}earn to Identify Dangerous Vehicles using a Simulator},
booktitle = {},
year = {2019},
volume = {},
pages = {}

Dataset Sample Videos

1. Accident Scenes in GTACrash

2. Non-accident Scenes in GTACrash

3. Accident Scenes in YouTubeCrash

4. Non-accident Scenes in YouTubeCrash

Domain Adaptations

1. Feature domain adaptation: Real World → Synthetic Domain (GTA V)

2. Label Domain Adaptation: Using CTRA model