Like in project 1, the data can be found at the City of Chicago Data Portal and was published by the Chicago Transit Authority.
Seeing that CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.tsv - the file that included the rides per stop for each available date from 2001-2021 - was ~ 40MB in size, and above the file size limit for data sources on R, I wrote a script that would read a certain number of rows from this file to create many CSV files that were the appropriate size for shiny (< 5MB). These files were read to create a dataframe that held a ride count per station for each date.
The same process was applied to CTA_-_System_Information_-_List_of__L__Stops.tsv, which held the location data of each stop. This dataframe was used to access location data.
The two dataframes were then merged since the station_id matched the map_id. This is the dataframe that was subsetted or used entirely depending on the inputs.
Using the read.table() function, I read the two .tsv files and made dataframes for each.
Using the [] operators, I created dataframes by subsetting the aforementioned frames [startRow:endRow,]
Using the write.csv() function, I made many smaller readable .csv files.
I used merge() to join the dataframes that contained ridership data per stop from 2001-2021 and the one that contained location data.
Using the subset() function, I selected portions of the dataframe that held ridership data.
Using the filter() function, I selected portions of the dataframe that held ridership data to find the data for a stop that was chosen on the map given the latitude and longitude values.
As for the tables and bar charts that represented the ridership at individual stops, the same methods from project 1 were used.
I simply created two dataframes by subsetting the ride+location data by two dates and found the difference between the rows.
The process of making ggplot bar charts and tables was as simple as the process for project 1. The issue was modifying the view depending on the stop choice and the dates. All that was needed to be done was check for a condition and then subsetting the data accordingly.