Data Information

Data Preparation
- Pre-importation
  - Since R and Shiny can only accept data files that are less than or equal to 5 MB, I had to split the CTA - Ridership - 'L' Station Entries - Daily Totals file into multiple parts. To do this, I used the Linux split command and split the file based on size. This link was my source: Split Command in Linux: 9 Useful Examples (linuxhandbook.com). The split command produced 8 files which I then put an extension of CSV on.
  - When I split the file, all of the data in each of the files was put under one column, so I utilized the text to column feature that split the data into different columns.
  - Since the files were split based on size, the last row was cut-off and the first row of the next file had the rest of the cutoff data. I had to reunite the cutoff data and add back the column headers to each of the files.
  - The CTA - System Information - List of 'L' Stop file also had to be altered. The location column gave the latitude and longitude in a tuple which made it hard to do any map visualization with it. I employed the same text to column operation and split the tuple into two columns. When I did that, the latitude column looked like "(Lat," and the longitude column looked like "Lon)", so I search and replaced the parenthesis and comma with an empty space.
  - Lastly, I changed the column name called MAP_ID to station_id because that was the common identifier for both files

Post-importation

- - Created a more universally readable date column (in the form year-month-day) called newDate using Lubridate for each of the 8 files.
  - Created a column solely for the years called Year.
  - Merged each of the 8 files with the dataset containing the lat-lon information on the column station_id.
  - Created two columns called Line and Code in each of the 8 merged files. The Line column specifies what CTA line the station is a part of and the Code column gives the Line column a numeric identifier (which was used to color code the map markers).

Page updated

Report abuse