This project visualizes the flow of taxi rides from different communities within the Chicago Metropolitan area, it is hosted at http://shiny.evl.uic.edu:3838/g17/tl_proj3/.
This chart is shown to the left of the map and shows either the flow of taxi rides into a community from a selected community or the outflow of taxi rides out of a community depending on whether inflow and outflow is selected.
This is a map showing the outline of the Chicago community areas as outlined by https://www.chicago.gov/content/dam/city/depts/doit/general/GIS/Chicago_Maps/Citywide_Maps/Community_Areas_W_Numbers.pdf
Hovering over a community will give you a pop up of that communities name. While clicking on a community will update the colors and graphs to show data about taxi rides to and from that community. The maps colors will adjust to the legend and show what percent is coming to and from that community.
There are 6 charts starting from the top left and going down:
The first chart shows the number of taxi rides that occur each day
This chart shows at which hours those taxi rides are occurring
This chart shows which days of the week those taxi rides are occurring
A chart of which month taxi rides are occurring
A histogram binning the distances each ride travels
A histogram binning the duration of each ride
Data was downloaded from https://data.cityofchicago.org/Transportation/Taxi-Trips-2019/h4cq-z3dy
The data was then cleaned, indexed and reduced using a Jupyter Notebook and the pandas library. The easiest way of setting up this environment is to install anaconda from: https://anaconda.org/ then open Jupyter Notebook through the GUI launcher. The code for data munging is shown below.
It first removes rides which may be insignificant or incorrectly recorded by dropping trips of less than 0.5 or more than 100 miles, and trips lasting less than 60 seconds or more than 5 hours.
Removes areas outside Chicago since we do not have information on those locations and therefore cannot be visualized.
Reducing time accuracy to just hours instead of minutes, The minutes counter was too specific for our needs and took up unnecessary memory.
The company names were repeated multiple times in string format which took up a lot of space, the companies were itemized into another list and their names replaced with relational ID's to save space
Storing the Pickup time in datetime format took up extra space so it was split into storing hours, days, and months in separate columns as integers.
Finally the Data was exported in 55 different .tsv files based on their company ID's
All code including the Jupyter Notebook and necessary data files generated by the Jupyter Notebook are available on Github: https://github.com/otny12/424proj3
Instructions on running the Jupyter Notebook are listed above.
In order to run the Shiny app locally you will need to install both R and R-studio.
R is available from https://cran.r-project.org/mirrors.html the version of R which this project was developed on is 4.1.3
R studio is available from: https://www.rstudio.com/
to install required packages you can run the command: install.packages("shiny", "shinydashboard", "ggplot2", "lubridate", "DT", "jpeg", "grid", "leaflet", "scales", "rgdal", "data.table")
Open the project using the app.R then select run app.
Using the map we can see how taxi rides move from one community to another, we can see that rides going into Avondale mostly come from the surrounding communities.
We can compare trends in how communities in the north compare to those in the south. Below there are two communities selected, North Center and Auburn there is a consistent trend that north communities usually have their highest percentage use internally for North Center we see about 14%, while in Auburn 15% of rides go to the Near West Side.
we can see how the ridership falls during specific time frames. Something that is very clear is a drop in taxi service during April of 2019. This is because there was a record setting snow storm which probably caused most riders and even drivers to stay home.
We can also theorize on why people are riding taxis into certain communities. Looking at Lincoln park and O'hare we can see a large difference in the weekdays chart and hourly charts. For Lincoln park we can see that most trips occur during the weekends and during the post work hours around 6pm and later, so we can guess that most of those rides are likely for recreation such as dining. Compared to O'hare which has most of its activity during the weekdays and during regular business hours from 6am to 6pm, so we can guess that these rides are for a different purpose such as catching flights for business.