The data that was used to create this project is from the Chicago Data Portal. The website that this can be located at is: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f. The data originally included columns:
station_id
stationname
date
daytype
rides
To make the data more helpful for me, I did several things to alter it. Firstly, I used the R library Lubridate to create multiple new columns:
Created a more universally readable date column (in the form year-month-day) called newDate.
Created a column solely for the years called year
Created a column solely for the months called month
Created a column solely for the days in the week called dayOfTheWeek
Something to note is that the month and dayOfTheWeek columns is that they are in the form of a name (January) and not a number (ex 1 (for January)).
Then, I created separate tables for UIC-Halsted (called UICHalsted), O'Hare Airport (called Ohare), and Cumberland (called Cumberland). Each of these new tables contain the added columns mentioned above. This was done by using the R subset function and separating on the station_id.
CTA Data after the new columns were added