Data Exploration for


For dataset of weather.csv

It consists of the weather details of the crash site on the accident dates

We have done the following preprocessing and data exploration:

  1. We have joined it with the accidents dataset based on the dates.

  2. We have combined both the components of wind (u and v) and have added the resultant wind velocity in the column

  3. Then using data exploration and visualization, we have found relations between the crashes based on the meteorological factors.

boxplots for temperature, humidity, precipitation and resultant winds according to year

average temperature, humidity, precipitation and resultant winds according to month

We categorized the dataframe into 4 time slots of morning, evening, afternoon and night.

density of temperature is highest in the middle of night

density of precipitation is highest in the middle of night while later hours for morning

density of humidity is highest towards later hours of evening.

density of wind velocity is highest in the middle of night.


  1. Most number of accidents occur between 6:00 am to 7:00 am. Although the traffic is less at that time, but people overspeed due to empty roads, and hence, causing more accidents.

  2. Accidents mostly occur the months of March and May

Correlation between environmental factors

Correlation between temperature, rainfall and location


  1. humidity and specific humidity are strongly correlated.

  2. temperature and precipitation are strongly correlated

rainfall distribution yearwise

Average distribution of rainfall

maximum crashes occur when ranifall is 20 and 30 mm

maximum crashes occur when wind velocity is 0.5-2.5 km/h

maximum crashes occur when temperature is 15 degree C

Statistics for Humidity

Statistics for Temperature

Statistics for Precipitation

Statistics for Precipitation

Inference for weather plots and accident impact