These visualizations can be found here.
The above graph shows that there is high activity in terms of people taking taxis on Mondays and relatively lower activity during the week and steadily rises throughout the week.
a. NYC Taxi Data:
Below correlation plot shows that how different attributes are correlated to each others.
From the above correlation plot, we can confirm that only a few of the variables are highly correlated. They are:
i. Trip_distance vs Fare_amount: This confirms the initial assumption that the longer the trip distance, the higher the fare. It is also followed by tip_amount and toll_amount with respect to trip_distance.
ii. Trip_distance vs Trip_times: Even these attributes follow the same lines as the above relation.
iii. Speed vs Trip_distance: As the trip_distance increases, speed too increases.
iv. Speed vs fare_amount: Even these attributes follow the same lines as the above relation.
b. Weather Data:
Below correlation plot shows that how different attributes are correlated to each others.
From the above correlation plot, we can confirm that only a few of the variables are highly correlated. They are:
i. Temp vs feelslike vs drew: These columns are highly correlated if the temperature directly affects the others.
ii. Precipprob bs humidity: If the humidity is more, there is a high chance that precipitation is also high.
From the visualization, we can confirm that around 80% of the taxi riders used payment method 1 which is the credit card.
In order to analysis whats the distribution of trip times and how long the trips usually last, PDF of the distribution is plotted.
Its very clear that the distribution is right skewed and 90% of the time the total trip time is less than 50mins.
The PDF of log values of trip time attribute confirms that the values stick to normal distribution but its is right skewed.
From the scatter plot, both the trip distance attribute and fare amount seems to have a linear relationship.
As the distance increases, the cost of the trip increases too.
Thus one can observe that most pickups and drops occur in the evening. While the least drops and pickups occur during the morning.
Here one can see, passenger count has no such relationship with trip duration. But it is noted that there are no long trips taken by higher passenger counts like 7 or 9. While the trip duration time is more or less evenly distributed only for passenger count 1.
From the obeservation, one can see that vendor 1 mostly provides short trip duration cabs while vendor 2 provides cab for both short and long trips
One can see the trip duration is the maximum around 3 pm which may be because of traffic on the roads.
Trip duration is the lowest around 6 am as streets may not be busy.