This project revolves around predicting the demand for NYC taxis and finding where there is high demand for taxis and during what time of the day. This also considers how the weather plays a role in determining the demand for taxis. In order to find the relationships and predict the demand, these data are collected from two sources. They are:
NYC Cab Data:
It's obtained from NYC Taxi & Limousine Commission Government Site. Data is collected from this site using Web Scrapping Techniques.Â
Information on taxis:
Yellow Taxi: Yellow Medallion Taxicabs are the famous NYC yellow taxis that provide transportation exclusively through street hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged.
For Hire Vehicles (FHVs): FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.
Green Taxi: Street Hail Livery (SHL) The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides.
This project mainly focuses on Yellow Taxi prediction since they are the most used taxis in New York City. For training the model, data has been collected from January 2022 to June 2022 and for testing the predictions, data has been collected from July 2022 to December 2022.Â
Below is the data dictionary which describes the different attributes of the data.
From the below code, one can use the API request to select the year and month and download the taxi data to one's working directory.
Raw dataset for January 2022 can be found here.
Weather Data
This data is obtained from Visual Crossing Weather Site through an API. This data is used to find the relationship between the demand for the yellow taxi in correspondence to the weather during the time of the day.Â
Select the required city and date range to create an API URL. Also, choose how frequently you want the data to be collected.Â
Step 1:
Create an account on visualcrossing.com and search for the city and select if you need history or forecast data.
Step 2:
Upon selecting the city, select the date range for which you need the data and click on API.
Step 3:
Upon clicking API, select the language for the query and output file type. One can also select the frequency of the data to be collected.
Once selected, copy the API link and use the below python code to download the data to the local system.
Python code for downloading the data to one's local using API.
Below is an example of an URL through which one could connect and retrieve the weather data and the raw data obtained.
Decoding the API URL:
BaseURL = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline"
Location = "New%20York%20City%2C%20USA"
Start_Date = "2022-01-16"
End_Date = "2022-01-31"
Output_Select = "unitGroup=us&include=hours"
API_Key = "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
content_type = "csv"
URL = BaseURL + Location + "/" + Start_Date + "/" + End_Date + "?unitGroup=us&include=" + Output_Select + "&key=" + API_Key + "&contentType=" + content_type
URL = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/New%20York%20City%2C%20USA/2022-01-16/2022-01-31?unitGroup=us&include=hours&key=XXXXXXXXXXXXXXXXXXXXXXXXXXX&contentType=csv"