Data Exploration for

Accidents

For dataset of accidents.csv

It consists of the list of crashes in the year 2018 and 2019 with the coordinates of the crash and the date and time of the crash.

We have done the following preprocessing and data exploration:

  1. First we have split the date and time column into hrs, min, sec, day, month and year.

  2. Then we have added a column for checking wether the day is a weekday or a weekend

  3. We have also calculated the distance of the crash locations from the centre of the city.

  4. We have calculated the sun elevation of the location during the crash

  5. Then using data exploration and visualization, we have found relations between the crashes based on the hr, month, year etc.

Original dataset

After data preprocessing and manipulation

Density plot of car crashes on different days of the week

Density plot of accidents on weekdays vs weekends


Inferences

  1. Most number of accidents occur on Wednesday and Thursday, they can be concluded as the busiest road days of the week.

  2. Accidents occuring on weekends is very minimal compared to those occuring on weekends

bar graph of number of accidents based on the hour of the day


bar graph of number of accidents based on the month

Inferences

  1. Most number of accidents occur between 6:00 am to 7:00 am. Although the traffic is less at that time, but people overspeed due to empty roads, and hence, causing more accidents.

  2. Accidents mostly occur the months of March and May

bar graph of number of accidents based on the year

Histogram for distribution of accident frequnecy based on hours for the year 2018 and 2019

Analysis of the frequency of crashes every 3 hours for each day of the week

Analysis of the frequency of crashes each month for each day of the week

Inferences

  1. The crashes that occurred in 2018 is greater than the crashes in 2019

  2. For both 2018 and 2019 maximum accidents occurred between 8:00-10:00 on weekdays while on weekends maximum accidents occurred between 17:00-20:00 hours. This can be because 8-10 is the office hours and roads are very busy during that time. While for weekends most people go on outings and return during 17-20 hours. Hence the timing varies.

Visualization of data points of latitude and longitude based on weekend and weekdays

Visualization of data points of latitude and longitude year wise based on weekend and weekdays

Visualization of data points of latitude and longitude based on weekend and weekdays

Visualization of data points of latitude and longitude year wise based on weekend and weekdays

Correlation plot

Since the values are mostly categorical there is not much correlation between the variables.

  • longitude is highly positively correlated with distance from city-center and elevation angle.

  • latitude is highly negatively correlated with distance from city-center and elevation angle.

  • weekday and weekends are negatively correlated

Statistical Facts

  1. Number of accidents occurred each day

2. 70% of the accidents occured in the year 2018 while only 30% accidents occurred in 2019. Due to improvements in healthcare, infrastructure and awareness, accident rates came down by 40%.

3. Maximum accidents occur in the the region -1.55 to -1.05 latitude and -3.05 to -0.57 longitude region with the count of 6070 number of accidents.