Chicago CTA Data - Data Source & Manipulations

Chicago CTA Data

Data Source & Manipulations

Data Source

There are two different data sources:

The CTA Ridership data from the Chicago data portal

Link: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f

The CTA Station Data with Latitude/Longitude and data about what color line do they belong to

Link:

https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme

Data Format

I downloaded both data sources and read them in as .csv files.

The CTA Ridership Data contains data columns for:

station_id, stationname, date, daytype, number of rides

The CTA Station Data with Geolocation data contains columns for:

stop_id, direction, stop_name, station_name, station_descriptive_name, map_id, (CTA Line) 10 rows boolean values, Location (Latitude & Longitude Tuple)

Data Manipulations

Geographic Data

For the location that was stored in a column in the CTA Station with coordinates, I needed to get rid of the parenthesis and then split the columns with the common delimiter and convert it into a numeric value. The reason I needed to do this was in order to pass the latitudes and longitudes into the parameters to add markers to the leaflet and mark all the CTA stations.

Dates Formatting

For the date, they were initially in the format YYYY-MM-DD so in order to read the dates better and be able to compare dates, I used the lubridate library in order to convert all the dates into MM/DD/YYYY

Grouping Data

In order to look at specific data, for example, looking for CTA Ridership at the UIC-Halsted station in 2011, I would take the subset of the main dataset to reduce processing time when displaying the actual bar charts, data tables, or leaflets.

Reactive Elements for user input

The above-mentioned grouping data is what will be used to be made into a reactive element so that when the user changes their input then the graphs will spontaneously react in order to display the new graphs.

Order of data in bar charts and data table

In order to display data that the user can differentiate easily when plotting, I would order the data alphabetically, in ascending or descending order to show key aspects of the data and allow outliers and observable data to be more visible.

Page updated

Report abuse