Crash To Not Crash:

Learn to Identify Dangerous Vehicles using a Simulator

School of Electrical Engineering, KAIST

* authors contributed equally


Developing a computer vision-based algorithm for identifying dangerous vehicles requires a large amount of labeled accident data, which is difficult to collect in the real world. To tackle this challenge, we first develop a synthetic data generator built on top of a driving simulator. We then observe that the synthetic labels that are generated based on simulation results are very noisy, resulting in poor classification performance. In order to improve the quality of synthetic labels, we propose a new label adaptation technique that first extracts internal states of vehicles from the underlying driving simulator, and then refines labels by predicting future paths of vehicles based on a well-studied motion model. Via real-data experiments, we show that our dangerous vehicle classifier can reduce the missed detection rate by at least 18.5% compared with those trained with real data when time-to-collision is between 1.6 s and 1.8 s.

Demonstration Video


Overview of GTACrash & YouTubeCrash

Our synthetic dataset GTACrash is collected from a video game named Grand Theft Auto V (GTA V). It consists of 3661 non-accident scenes and 7720 accident scenes where each scene is made of 20 frames of images. Total number of positive samples (dangerous vehicles) is 128437 and total number of negative samples is 623173. We provide sample code for reading the image and the label to visualize each frame. Please note that this dataset is for research and educational use only.

Our test dataset YouTubeCrash is collected from dashcam videos uploaded on a YouTube channel called Car Crashes Time. It consists of 122 clips of video and each clip is then divided into a pair of accident and non-accident scene; each scene has 20 frames of images. Please note that this dataset is for research and educational use only.

Label File Structure of GTACrash Dataset

For the YouTubeCrash dataset, the structure of the label files is the same with that of GTACrash. However, the name of the attribute "syntheticLabel" is changed to "label". In addition, attributes such as "position", "forwardV", "speed", "acceleration", "angularVelocity", "objectSize", and "adaptedLabel" are all filled with "NA" value.


The code for reproducing experiments and pretrained model is available here:

Sample jupyter notebook for utilizing the dataset is available here:

Release Log

11/06/2018 - Initial data release

13/03/2019 - Bugs in the GTACrash dataset are fixed

15/03/2019 - Release of training/test code and pretrained model


Please cite our work if you use the code or data from this site.

author = {Hoon Kim, Kangwook Lee, Gyeongjo Hwang, Changho Suh},
title = {Crash {T}o {N}ot {C}rash: {L}earn to Identify Dangerous Vehicles using a Simulator},
booktitle = {},
year = {2019},
volume = {},
pages = {}

Dataset Sample Videos

1. Accident Scenes in GTACrash

2. Non-accident Scenes in GTACrash

3. Accident Scenes in YouTubeCrash

4. Non-accident Scenes in YouTubeCrash

Domain Adaptations

1. Feature domain adaptation: Real World → Synthetic Domain (GTA V)

2. Label Domain Adaptation: Using CTRA model