Data Exploration for
Accidents
For dataset of weather.csv
It consists of the weather details of the crash site on the accident dates
We have done the following preprocessing and data exploration:
We have joined it with the accidents dataset based on the dates.
We have combined both the components of wind (u and v) and have added the resultant wind velocity in the column
Then using data exploration and visualization, we have found relations between the crashes based on the meteorological factors.
boxplots for temperature, humidity, precipitation and resultant winds according to year
average temperature, humidity, precipitation and resultant winds according to month
We categorized the dataframe into 4 time slots of morning, evening, afternoon and night.
density of temperature is highest in the middle of night
density of precipitation is highest in the middle of night while later hours for morning
density of humidity is highest towards later hours of evening.
density of wind velocity is highest in the middle of night.
Inferences
Most number of accidents occur between 6:00 am to 7:00 am. Although the traffic is less at that time, but people overspeed due to empty roads, and hence, causing more accidents.
Accidents mostly occur the months of March and May
Correlation between environmental factors
Correlation between temperature, rainfall and location
Inferences
humidity and specific humidity are strongly correlated.
temperature and precipitation are strongly correlated
rainfall distribution yearwise
Average distribution of rainfall
maximum crashes occur when ranifall is 20 and 30 mm
maximum crashes occur when wind velocity is 0.5-2.5 km/h
maximum crashes occur when temperature is 15 degree C
Statistics for Humidity
Statistics for Temperature
Statistics for Precipitation
Statistics for Precipitation