Shiny App: http://shiny.evl.uic.edu:3838/g1/project3/
Youtube Video: https://youtu.be/ptBp4EAyCMY
The web app was made to visualize Taxi rides for Community areas in Chicago during 2019. The application shows different bar charts and tables to analyze the data. There are two different modes to view data. One is for Community Areas in Chicago and one is for Taxi companies. The user has the option to select which mode and depending on the mode it will display the corresponding heat map. The map allows the user to click on a Community area and it will show the to/from percentage of rides associated with the selected Community. The user has control over viewing pick up data or drop off data for either Communities or for Taxi companies. The user can specify which Taxi company to view as well. Selecting a Community will update the original bar charts for that specific community. The map and the bar charts can be resetted back to the original bar charts.
We got the data for the Chicago Taxi rides from https://data.cityofchicago.org/Transportation/Taxi-Trips-2019/h4cq-z3dy
This was about 6GB of data. Then we used cut -d',' -f 3,5,6,9,10,17 Taxi_Trips.csv > Taxi_Reduced.csv command to only keep those columns and saved the result to a new file called Taxi_Reduced.csv . Then we used $ awk -F, '$2>=60 && $2<=18000' Taxi_ReducedOne.csv > Taxi_TripTime_Reduced.csv and awk -F, '$3>=0.5 && $3<=100' Taxi_TripTime_Reduced.csv > Taxi_MileReduced.csv to make sure that the final csv file "Taxi_MileReduced.csv" file only contains rides with more than 1 minute of duration and less than or equal to 5 hours of time. As well as, to make sure that the taxi ride was between 0.5 and 100 miles. Then we encoded our taxi company names as unique number which significantly decreased the size of the file. We stored the names of the tax companies along with unique number in a separate file(Company_names.csv). We also used command line tools to make sure that we only had data that belonged to Chicago community area, ie, the pick community area and drop-off community area was between 1 and 77 or equal to them and was also not empty. Then we used split command to break the file into 11 csv files(x01_1.csv to x11_1.csv) which is below 50 MB each.
While loading the data to our app.r file we read the csv files using the "fread" function of the "data.table" library as this was faster than the regular "read.csv" function.
We are also using a shape file(cs424proj3\project3Taxi\CA) for the community map which is obtained from the https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6
Sample taxi data stores in x01_1.csv to x11_1.csv
Company names stored with unique id
GitHub Link: https://github.com/rkodit2/cs424proj3
To run the application in RStudio, it is required to download the files or clone the repo from GitHub to a new folder on your computer and then opening the 'app.R' file inside of the 'project3Taxi' folder. Make sure all the data files are in the new folder that was created.
It is also important that all the packages required to run this application are installed in RStudio. To make it easier, here is a list of packages need to be installed prior to running this application:
Then click on the 'run app' icon located at the top right of your 'app.R' script in RStudio.
To run the application on the uic server transfer contents of project3Taxi from the GitHub to the uic server. Then make sure the r file is named as app.r. Then the application will automatically run.
After viewing pick up data for O'Hare community, the highest number of rides was around 8pm as shown in the graph below.
The highest number of rides during the week for O'Hare community for pick up were on Monday and Thursday which would make sense as they are closer to the weekend.
Another interesting finding about O'Hare was that there were more rides in May, June, and October as shown in the graph below. It seems to be a little unusual for people to be traveling in October.