This webapp is a simple visualization tool showing the statistics for three different CTA Blue Line stops. The data used in this webapp comes from the Chicago Transit Authority1 and represents the daily counts of all the riders on the Blue Line L-train from 2001 to 2021. The data was collected by counting how many individuals passed through a turnstile while excluding transfer stops. While there are minor errors in the data collection, as not all riders that enter a turnstile take a ride, it is the most accurate representation of riders available as it comes directly from the Chicago Transit Authority.
To use this webapp, simply navigate to https://cgrams2.shinyapps.io/CS424Project1/ and you will be immediately greeted by the visualization of three Blue Line stops: UIC-Halsted, O’Hare Airport, and Forest Park. These points were chosen to be used in this project as the Blue Line begins at O’Hare Airport, ends at Forest Park, and passes through our campus at UIC-Halsted. Due to the limitations of shinnyapps.io, not all stops can be represented in this webapp as the data must be truncated. The initial visualizations under each stop show a bar chart representing the yearly sum of all riders at that stop for that year. Underneath the bar chart, a table can be found representing the numerical values used in the visualization. On the left-hand side under each visualization, simple controls can be found to view different granularity and years of the data collected. A user can select a year between 2001-2021 (or all years) using the drop-down menu. Once a user selects a year to analyze, a user can select between viewing daily totals, monthly totals, and day-of-the-week totals for the highlighted year. Once selected, the graph and table for that stop will be automatically updated to reflect the selection. If the user selected the day range, the user would also specify a date range to view using a date picker above the chart as not all 365 days can be viewed in the table at a time. Since the controls for each Blue Line stop visualization works independently, a user can compare between different years and different ranges between the different stops.
This webapp was developed for CS 424 for our first project in developing a visualization tool using the Shiny package and the R programming language. ggplot22 and lubridate3 packages were also used in the development of this webapp. This webapp has room for improvement, with the possible addition of all Blue Line stops, but was developed for the project specifications, time restraints and shinyapps.io restrictions for this class.
The data used in this webapp comes from the Chicago Transit Authority and the City of Chicago1. The data was exported from the Chicago Data Portal to a tsv for Excel file and imported into R for preprocessing. To properly import the data, the single quote must be ignored as a string delimiter as entries for O’Hare Airport will be corrupted otherwise. Once the data is imported into a data frame, I analyzed the data to get the station_id for the three stops in question: UIC-Halsted (40350), O’Hare Airport (40890), and Forest Park (40390). After getting the proper station_ids, I subset the original imported data frame into three separate data frames for each station and exported these data frames into csv files.
Once the original data set is subset into the three different csv files representing the stations in question, the following step for processing the visualization applies to all three csv files. We first begin by importing the csv into R. Once imported, the date strings are converted into date objects by using the lubridate3 package by selecting the date column in the data frame. Once the dates are converted, the data frame is ready to be visualized.
Head of the generated UIC-Halsted CSV file
To produce the yearly sum bar chart and table, we aggregate the table by the years in the table with the SUM function. The resulting data frame from the aggregate function will have a column with the numerical year and a column containing the sum for all the days in that year. For organization, I renamed these columns to “Year” and “Riders” respectively. To generate the bar plot, I used the geom_bar function within the ggplot22 package by using the “Year” column as the x and “Riders” as the y axis. Since the yearly sums are a large quantity, I scaled the “Riders” column by 1,000,000 for this visualization only to make it easier to read and reflected that on the y-axis label. The resulting data frame is also used to produce the table underneath the chart but with a few changes for appearance. To make the table easier to read, I used the prettyNum function to add commas within the “Riders” column and casted the “Year” column as an integer so that the column is rendered correctly.
Head of resulting aggregation from UIC-Halsted data frame.
To produce the month bar chart and table, a similar workflow to the one used to produce the yearly sum is used. The key difference is by making a subset of the desired year before performing the same aggregation on the twelve months found within a year using the month() function found in lubridate3. The same ggplot2 function is used as in the year bar chart but with the “Riders” column scaled by 1,000. The table is exactly like the table found in the yearly sum minus the integer casting of the date column. For the day of week visualizations, the same steps are used as the month visualizations with the only difference being the aggregation function using the days of the week provided by the wday() function from lubridate3.
Unlike the month and day of week visualizations, the daily visualization is much simpler. The workflow is like the month visualization minus the aggregation and scaling of the “Riders” column as the data frame already represents the day-to-day totals. To produce a good-looking table, the date within the data frame is formatted using the format() function to a year-month-day format. Since there are numerous days to show within the table, the data shown within the table is a subset between two date ranges selected by a dateRangeInput() widget in the Shiny application.
Date range widget to allow for user defined date range.
The UI and server functionality is handled by the Shiny package. To layout the three different Blue Line stops, a fluidPage is used with three columns of size 4. Included within each column, one can find a header representing the stop name, a selectInput() widget with the options being each year between 2001-2021, a radioButtons() widget with the different scales as options, a plotOutput() widget for the plot, and a tableOutput() widget for the table. In addition, there is a conditional statement to show a dateRangeInput() widget above the table if the radioButtons() widget is selected to “Day” and another conditionial that hides all other options other than “Year” if the selected year range is “2001-2021”. The server runs the R code mentioned above using the corresponding widgets as input.
Drop-down menu to select year.
Widgets used for user input. Drop down menu to select desired year and radio buttons to select viewing range.
The most outstanding trend found within all three Blue Line stops is the decline in ridership counts due to the COVID-19 pandemic in 2020. This trend can be seen in the initial graphs showing the yearly totals from 2001-2021 with the 2020 and 2021 bars are significantly decreased compared to the previous years. Selecting the month view for the year 2020 for each stop, we can see that the lockdown affected the Blue Line ridership in March of 2020. Using the day view for the year 2020 and narrowing the selection to March 1st to March 31st, we can see that the decrease was sudden but not instantaneous. The decline starts on March 14th and settles around March 20th in all three graphs. One thing to notice in this decline is that the rate of the decline is different between all three graphs. Most notably, the UIC-Halsted stop had a steep decline while the O’Hare Airport stop had a smoother decline. This could be since UIC students were no longer commuting to campus on the specific day that it closed while O’Hare Airport flyers still flew on essential flights while the lockdown was being put into place within other states and countries. There are no other declines in ridership on the CTA Blue Line as severe as the COVID-19 pandemic caused that can be found within the years of this data set.
Yearly view at UIC-Halsted
Monthly view at UIC-Halsted in 2020
Daily ridership at UIC-Halsted in 2020
Looking back at years before the COVID-19 pandemic, we can find individual trends within each of the stops displayed. Using the day-of-week view at UIC-Halsted, we can see that this Blue Line stop is most used during weekdays and not the weekends. The weekends notice a significant drop in ridership, about a quarter of a weekday ridership, in each year prior to the pandemic. Using the same view for the Forest Park stop, the same trend can be found as in the UIC-Halsted stop. The weekends show a noticeable drop in ridership, while not as large as in UIC-Halsted, at the Forest Park stop. Interestingly, this trend is not found at the O’Hare stop as ridership is mostly consistent throughout the week. We can hypothesize that the UIC-Halsted and Forest Park stops are mostly used by students and typical workers that go to classes and/or work during the week while the O’Hare Airport stop is mostly used by travelers.
Day-of-week view at UIC-Halsted on a pre-COVID-19 year
Day-of-week view at O'Hare Airport on a pre-COVID-19 year
Day-of-week view at Forest Park on a pre-COVID-19 year