Perform an in-depth analysis of the traffic violations/incidents data that occurred in Montgomery County, to get overview insights to understand the relationship between the influence factors and crash outcomes;
Use supplement data to explore weather reason and other factors which may behind the pattern of vehicle collisions;
Use machine learning to predict the future result of traffic violations and vehicle collisions.
1. What will the data look like on a map? [1]
2. Are there any specific dangerous spots for pedestrians or drivers?
3. Explore weather reason and other factors which may behind the pattern of vehicle collisions.
4. For those cases with people killed in general, are their distribution over location and/or time shows any significant different pattern/feature than the overall cases? [1]
5. What are the traffic accident’s features and how they affect the severity of traffic accident?
Road safety is a critical issue and is relevant to everybody's daily life. Road traffic injuries are a leading cause of preventable death, especially, for the young people. [2]
In 2018 the lives of 501 people are cut short as a result of road traffic accident in Maryland. More people suffer non-fatal injuries and/or considerable economic losses.
Montgomery County has continued to register numerous traffic violations and vehicle incidents daily and has released this data to the public as a part of the open data initiative. In recent years, machine learning methods are efficient technologies that have been widely used in traffic prediction problems because of their ability to process multi-dimensional data, flexibility in implementation, versatility, and strong predictive capabilities. [3]
I always have an interest to understand and analyze the real-world data for local insight, see what we can learn from the data to help better prevent and/or avoid collisions in the future.
image credit: https://data.montgomerycountymd.gov/
1. Crash Reporting - Incidents Data: This dataset provides general information about each collision and details of all traffic collisions occurring on county and local roadways within Montgomery County, as collected via the Automated Crash Reporting System (ACRS) of the Maryland State Police. There are around 67.4K rows and 44 columns. Each row is a "Collision".
2. Crash Reporting - Drivers Data: This dataset provides information on motor vehicle operators (drivers) involved in traffic collisions occurring on county and local roadways. There are around 115K rows and 43 columns. Each row is a "Driver".
3. Crash Reporting - Non-Motorists Data: This dataset provides information on non-motorists (pedestrians and cyclists) involved in traffic collisions occurring on county and local roadways. There are around 3711 rows and 32 columns. Each row is a "Non-Motorist".
4. Traffic violation data: This dataset contains traffic violation information from all electronic traffic violations issued in the County.
5. Montgomery County weather data: This page provides local weather extremes and records, holiday weather, COOP data, and area climate summaries.
6. Montgomery County demographics data: The data contains various information such as population, race, car ownership. This data will be paired with other datasets.
The incidents and drivers datasets contain crash data/time, weather and light condition, collision type, injury severity and location information.
Latitude and Longitude (Lat, Long) are geographic coordinate points that can map hotspot locations in future work.
The datasets started from 1/1/2015. I plan to use the recent 3 years’ data for analysis.
There are different types of crash in three dataset which are "Fatal Crash", "Property Damage Crash" and "Injury Crash", Possibly will change values to 'F' (fatal), 'P' (property damage) and 'I' (injury). The driver dataset has a column "Injury Severity", has different types of severity values which are "No Apparent Injury", "Suspected Minor Injury", "Suspected Serious Injury", "fatal".
The Driver Substance Abuse column's values are related to alcohol/drug contributed, need further works to clean it.
The crash date/time column combines the data and time value in one column, need to split the data and time to two columns.
[1]. Hua Yang, "New York City Motor Vehicle Collision Data Visualization", https://nycdatascience.com/blog/student-works/new-york-city-motor-vehicle-collision-data-visualization/
[2]. Chunjiao Dong, Chunfu Shao, Juan Li, Zhihua Xiong, "An Improved Deep Learning Model for Traffic Crash Prediction", Journal of Advanced Transportation, vol. 2018, Article ID 3869106, 13 pages, 2018. https://doi.org/10.1155/2018/3869106
[3]. Ming Zheng, Tong Li, Rui Zhu, Jing Chen, Zifei Ma, Mingjing Tang, "Traffic Accident’s Severity Prediction: A Deep-Learning Approach-Based CNN Network", https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8661485
over 10,000 vehicle incidents every year;
Averages 30 fatalities every year;
A Motorcyclist was injured in a crash every 2.6 hours.
A pedestrian or bicyclist was involved in a crash every 14 hour.
More than 90% incidents were at fault by Drivers.
Half vehicles disabled in the incidents.
The total incidents plots show a relevant flat trend, including year count and month count, but the fatal crashes counts may have some fluctrating trends and pattern. I will do more further exploration in fatal datasets.
Most cases are “Property Damage Crash”, much higher ratio than “Injury Crash” cases, while “Fatal” collisions are rarely seen.
In terms of victims, pedestrians consist of a significant portion of the total collision victims. The number is significantly higher than that of cyclists.
The highest number of fatal crashes occurred on Saturdays.
The least number of fatal crashes occurred on Tuesdays.
The most fatal crashes occurred in December.
The least amount of fatal crashes occurred in April.
The most fatal crashes occurred from 7:00 p.m. – 7:00 a.m. (147 crashes)
The least fatal amount of crashes occurred from 7:00 a.m. – 9:00 a.m. (5 crashes)
Single vehicle
Straight Movement Angle
Head On
Head On Left Turn
Same Dir Rear End
Same Dir Rend Right Turn
Same Direction Left Turn
Angle Meets Left Head On
RDF chooses the “crash date” as the most informative feature overall, gives more weight than crash time.
XGBoost chooses “Hit/Run” as the most informative feature.
Simply machine learning models do not perform very well in this stage.
Montgomery county traffic fatality rate is better than US and Maryland rate.
Fatal accidents are much more likely to occur in dark conditions – more than half fatal crashes occurred from 7:00 p.m. – 7:00 a.m. Weather conditions and road surface conditions have less notable effect on accident severity.
Accidents in snowy conditions are less likely to result in fatalities - it make sense because traffic tends to be slow and drivers tend to exercise great care in the snow.
The dataset does not contain information about responsible drivers
Too few fatalities cause imbalance of data
Data quality issue due to unified standard
A convolutional neural network (CNN) could be built using the satellite images that were scraped using Google's Static Maps API
Use Maryland dataset to get insights of the State traffic incidents
Consider to use population density data