There are two different data sources:
The CTA Ridership data from the Chicago data portal
Link: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f
The CTA Station Data with Latitude/Longitude and data about what color line do they belong to
Link:
https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme
I downloaded both data sources and read them in as .csv files.
The CTA Ridership Data contains data columns for:
station_id, stationname, date, daytype, number of rides
The CTA Station Data with Geolocation data contains columns for:
stop_id, direction, stop_name, station_name, station_descriptive_name, map_id, (CTA Line) 10 rows boolean values, Location (Latitude & Longitude Tuple)
Geographic Data
For the location that was stored in a column in the CTA Station with coordinates, I needed to get rid of the parenthesis and then split the columns with the common delimiter and convert it into a numeric value. The reason I needed to do this was in order to pass the latitudes and longitudes into the parameters to add markers to the leaflet and mark all the CTA stations.
Dates Formatting
For the date, they were initially in the format YYYY-MM-DD so in order to read the dates better and be able to compare dates, I used the lubridate library in order to convert all the dates into MM/DD/YYYY
Grouping Data
In order to look at specific data, for example, looking for CTA Ridership at the UIC-Halsted station in 2011, I would take the subset of the main dataset to reduce processing time when displaying the actual bar charts, data tables, or leaflets.
Reactive Elements for user input
The above-mentioned grouping data is what will be used to be made into a reactive element so that when the user changes their input then the graphs will spontaneously react in order to display the new graphs.
Order of data in bar charts and data table
In order to display data that the user can differentiate easily when plotting, I would order the data alphabetically, in ascending or descending order to show key aspects of the data and allow outliers and observable data to be more visible.