R (4.1.2 or later)
RStudio 1.4 (or later)
Instructions
If you want to mess around with the website, you can just visit the shiny link above
If you wanted to deploy to your own shiny site, you can do the following:
Clone repository and enter directory in RStudio
Make sure you have the following libraries installed (using install.packages())
Shiny
Shinydashboard
ggplot2
lubridate
DT
grid
gridextra
Activate the libraries in RStudio using library()
Go to your shiny dashboard and make a new token.
It should come up with some code to copy paste. Copy that code.
Type getwd() to make sure you are in the right directory (same directory as app.R)
If not, use setwd("/Path/To/Directory")
Paste the code into RStudio console
Type deployApp() in console
It should ask for confirmation of deployment and type "Y"
This site is a visualization page on the amount of CTA rides at the stops specifically at UIC-Halsted, O’Hare Airport, and 54th/Cermak in Chicago.
This can be selected using the first drop down list. With each of the three stations, you are able to display different graphs. One graph will display total riders for a specific station from 2001 to 2021. There are three other graphs that involve displaying the riders for a particular year which include riders each day, each month, and each day of the week. And then lastly, there is a final graph that displays each graph said above in a table like structure. In all graphs above except for the all years graph, you are able to select the year and have the graph change based on what year you selected. This doesn’t change for all years because the graph will already show all the years in the dataset.
You will also notice that the site is split into two columns. One side initially shows UIC-Halsted and another side shows O’Hare Airport. Here you can mix and match to compare different stations to each other based on your likings. You can even select the same two stations if you would like to compare different types of graphs of the same station. The controls on the bottom correspond to the graph above it. Each select boxes will only change the graph it’s assigned to.
There is also an about page that will show a little more information about the dataset used and what this website does. The website is optimized for a 2880x1620 screen resolution. Though, this will still work on a regular screen or monitor. It’d just require you to scroll down a little to be able to get to the select boxes on the bottom. The graphs should adjust its' width based on your screen size.
For each of the three data, it was compiled from an overall CTA L’ station dataset linked here. Each data entry came with a station id, station name, date, day type, and number of riders. For day type, they categorized it by weekday, weekend, and sunday or holidays. For each of the dataset, I added a column called newDates which formats the date into something more readable to parsing out the data. I also added another column called season which will check what month the entry is for and assign it’s proper season to it. For UIC-Halsted specifically, I added another column that checks to see if the month is a school month. This will come in handy later.
In terms of the actual graphs shown on the site, both 54th/Cermak and O’Hare have the same color scheme. In all the years, it just shows each individual year in different vibrant colors. For each day and each month, the colors are categorized by seasons. This helps so you can get a sense of the overall year since seasons are easier to think about. This also helps spot some trends in the dataset. And then lastly for days of the week, it just separates by day.
For UIC-Halsted, I chose to separate out the colors based on if school is in session or not. So I made the months from May to August and December in one color and every other month in another. This is because a good majority of the people who go and come to UIC-Halsted are likely UIC students so to help with visualizing when certain times are higher and lower, it makes sense to separate them out by this criteria. For the days, I left it as seasons since it’s hard to determine the exact days schools are in session due to each year being a little different in terms of school start and end dates.
As for the month and day of the week graph, I used the color scheme said above. This really helps tell users as to why the days and months are higher than usual.
We will start with UIC-Halsted. In terms of all years overall, 2020 and 2021 seem to have way lower numbers of riders. This is presumably due to covid and schools being online or hybrid so there were a significant amount of people that didn’t need to come to school. Otherwise over the years before, it seems that the amount of rides were going up. When looking at the days of the year graph, most of the years seem to be high numbers in the spring, fall, and winter. This somewhat makes sense as that’s when the school season starts. During the summer, it’s about half as many riders and you can see a dramatic drop at the end of the year, presumably during winter break. This seems to be the trend until 2020 around spring time halfway through dropping down dramatically which is when covid restrictions were in place. Then these numbers slowly get driven up until fall to winter 2021 in which there is a dramatic increase. This was likely due to school starting back up in person. Looking at the dataset for each month, you can see the same things happening as described above. One note here is that August is pretty high and even though it is marked as on break, school starts towards the end of August and you still have people moving back into school for the start of the semester so that might be an explanation as to why August is somewhat high. Then looking at the rides per each day of the week, Monday to Friday has the most riders while the weekends are a lot less. This is consistent for each year and it makes sense since school occurs from Monday to Fridays.
Now looking at the O’Hare Airport data, we can start by looking overall at all the years. We can see a similar trend and see that the amount of riders per year from 2001 to 2019 was slowly increasing. Then from 2020 to 2021, you can see that drop. Looking at the graph for each day, we can mostly see the highest number of rides is from late spring to the start of winter. Then in 2020, the amount of rides dropped from the start of the year and stayed steady up to 2021 where it is slowly increasing as the seasons go on. When looking at the data for each month, the highest months of riders seem to be May to October for most years. November and December are usually pretty high too but it doesn’t get as high as May to October. This is a little surprising to me because I would think that a lot more people would want to leave during the winter around Christmas time but December doesn’t yield the most riders in each particular year. And then same deal with 2020 and 2021, you see a drop at 2020 and then towards 2021, you can see the increase as the month goes on. When looking at the days of the week, we can see that each day is mostly around the same but Fridays are usually the highest per year. Surprisingly, Saturdays seem to be the lowest for a lot of years but not by much. This seems logical since people traveling can leave any day but Fridays are popular since work is done for that week for most people and then they can get on a plane for vacation.
Lastly, we can take a look at 54th/Cermak stop. Overall, the years increase as time goes on except when 2020 and 2021 hit. We see a dramatic increase from 2004 to 2005 for some reason. When looking at the dataset for each day of a year, the data is surprisingly consistent all season long up until 2020. A possible explanation is because Chinatown is around this stop and is a pretty popular spot for people to go regardless of what time of season. Around 2020, you do see a dip after the start of the season but the amount of people seem less affected on this stop because it only dropped to about half the people than usual per day overall. And then in 2021, you see that steady increase. For the month graphs, you can see that the highest points are around March to November consistently for years before 2020. For the days of the week graph, you can see that Monday to Fridays are usually pretty consistent with the highest riders. Then comes Saturday with the second highest and then Sunday with lower numbers.