PROJECT 1 - Subway
PROJECT 1 - Subway
App hosted at: https://tonyl.shinyapps.io/424Proj1/
Code and instructions for running locally hosted at: https://github.com/otny12/424_proj1/tree/main
This application is used to investigate how ridership at 3 CTA stations along the Blue line change over different timeframes. The Stations are Forest park, the Blue line’s most southern end, UIC-Halsted, a station by the University of Illinois to the west of the Chicago Loop/downtown, and finally O’Hare, the other end of the blue line which is also connected to the O’Hare international airport.
Usage
Stations can be selected using the drop downs highlighted in Yellow
We can also view different time frames using the buttons highlighted.
Yearly bars show the data aggregated by year
Daily Bars shows ridership by day for a year
Monthly Bars shows the ridership aggregated for by month for a particular year.
Day of the Week Bars shows shows the aggregate number of riders per day of the week ie. Monday or Tuesday, for a particular year.
In addition to this the data of each bar is shown in tabular form to the right.
By combining these tools we can compare different combinations of information. An example of this would be showing UIC-Halsted on top and bottom, where the top chart is 2001 daily bars and the bottom 2002, which could help us see if there are any cyclical patterns that repeat each year.
There is!
How Data was Handled
The data used was collected from https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f The data is hosted by the City of Chicago and was recorded by the Chicago Transit Authority(CTA). The data is shown in the form below
And there are over a million rows of data. station_id is used to identify stations, stationname specifies the name of each station, daytype is a categorical variable it takes on the value of W for weekdays, A for Saturdays and U for holidays or Sundays. This is listed because trains run on different schedules and for different hours depending on that classification. Finally rides specifies the number of entries to a station on that day, the assumption being that anyone who enters the station will leave the station on a train.
Data was reduced for upload to Shiny.io since hosting service only allows for a maximum datafile size of 5 mb, however the original file was 39mb. The visualization only looks at 3 stations, O’Hare, UIC-Halsted, and Forest Park so all other stations were removed.
For the Year Bars, the data is simply aggregated for each station by year, the sum of all days of that year are then displayed for each bar. For the Daily Bars the only manipulation that was done was to show only the year requested by the user. For the Monthly Bars the data was reduced to the user specified year then aggregated by month for that station. Finally for the Day of the Week Bars the data is reduced for that station to the year specified by the user, then based on the day of the week, the data is aggregated, and the sums shown.
Some Insights
We can see across all 3 stations there is a fall in the ridership in 2020
Taking a closer look we can compare the monthly ridership between 2020 and 2019 and we can a large difference in the ridership beginning in March which was when lock down began.
Looking at both UIC-Halsted and Forest Park, we can see that both are likely commuter stops as their usage is much larger on the weekdays than on the weekends.
Bringing up O'Hare on the bottom we can see that O'Hare shares the lesser ridership on the weekends, however the change between weekday and weekend is not as drastic as Forest Park