The data was found at the City of Chicago Data Portal and was published by the Chicago Transit Authority. The data that is used and viewed in the webpage is actually a subset of the entire data.
Seeing that the data file was ~ 40MB in size, and above the file size limit for data sources on R, I did some operations using R to select pieces that were relevant to me. I focused solely on the UIC-Halsted, O'Hare Airport, and Loyola stations. Here is how:
Using the read.table() function, I selected the file named "CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.tsv" and read its data to an object.
Using the subset() function, I targeted the aforementioned object and used boolean logic to select the rows of data where the "stationname" field was either "UIC-Halsted", "O'Hare Airport", or "Loyola". The information that was retrieved was written to a different object.
Using the write.csv() function, I used the newly created object was the data to write and named a file called "three-stops".
Although the value that was mapped onto the y-axis on the charts was the number of rides, there was more to be done in order to view the data in one of the many following ways:
Rides by year
Rides by month
Rides by day of the year
Rides by weekday
For each of the listed "views", I needed to do this for each stop and make it so any year could be chosen. Here's how I did it:
Having written a condensed table onto a csv file, I read the new file into the project using the same method as before.
Using the same method as step 2 above, I created three separate R dataframes by subsetting the data by the individual stop.
The ggplot is a powerful object that simply takes data and an aes object that sets the x and y axes values. Thus, finding the proper data for the ggplots was simply a matter of subsetting a dataframe by a column such as the year or not, and choosing the correct column for the x-axis parameter e.g "Month", "Date", "Year". For example, I didn't need to subset the dataframe for the UIC-Halsted stop when entering the data for the yearly chart. I did need to do this for the charts that showed data by the month or day since the views were determined by the user's choice in year.
Similarly to the ggplot, I had to occasionally subset the data by a specific column value or a combination of column values using the aggregate() function. This function would take a data table, take the sum of the rides column, and group the data by either month, day, year, or weekday. This function was necessary because otherwise, tables with repeated values would be made.
Although the many possible year and chart-type combinations (the choices to view the data by month, day, year, or weekday) seem imposing, the use of many conditional statements that check for the stop choice first, then the chart-type choice, and the use of dynamic ggplots and datatables whose parameters (data and column(s)) are the user's inputs made the data selection easy. The major problem was finding a way to condense the data.