Two datasets were used to build the application. Both datasets were collected from Chicago Data Portal.
Dataset that contains information about all the CTA L stops including their latitude and longitude can be found at: https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme. The file size is 48KB.
Ridership data of all the CTA L stations can be found at: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f. The file size is 39MB.
The CTA L stops information data provides location and basic service availability information for each place on the CTA system where a train stops, along with formal station names, stop descriptions, and line colors (RED, BLUE, G (Green), O (Orange), BRN (Brown), P (Purple), Pexp (Purple Express), Y (Yellow), and Pnk (Pink)). DIRECTION_ID refers to the normal direction of train traffic at a platform (E - East, W- West, N - North, S - South). STOP_ID is a unique identifier for each stop and MAP_ID is a unique identifier for each station. ADA column tells if the stop is ADA (American’s with Disability Act) compliant.
Table 1: CTA - System Information - List of 'L' Stops
The ridership data contains entries of daily rides entries of all the CTA stations in Chicago starting 2001 to 2021. The dataset shows entries at all turnstiles, combined, for each station. Daytypes are as follows: W = Weekday, A = Saturday, U = Sunday/Holiday.
Table 2: CTA L Station Ridership Data
The free web-based version of the Shiny server that was used to publish this project has a limit of 5 MB for each data file. Thus, we split the ridership data file (39 MB) into smaller pieces to be able to upload it. Python script used for splitting the ridership data and creating the TSV files is provided below:
#!/usr/bin/env python3
import csv
import os
import sys
os_path = os.path
csv_writer = csv.writer
sys_exit = sys.exit
if __name__ == '__main__':
# number of rows per file
chunk_size = 130000
# file path to master tsv file
file_path = "C:/Users/Akash/UIC/CS 424/tsv_splitter/CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.tsv"
if (
not os_path.isfile(file_path) or
not file_path.endswith('.tsv')
):
print('You must input path to .tsv file for splitting.')
sys_exit()
file_name = os_path.splitext(file_path)[0]
with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:
chunk_file = None
writer = None
counter = 1
reader = csv.reader(tsv_file, delimiter='\t', quotechar='\'')
# get header_chunk
header_chunk = None
for index, chunk in enumerate(reader):
header_chunk = chunk
header_chunk[0] = header_chunk[0][1:]
break
for index, chunk in enumerate(reader):
if index % chunk_size == 0:
if chunk_file is not None:
chunk_file.close()
chunk_name = '{0}_{1}.tsv'.format(file_name, counter)
chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')
counter += 1
writer = csv_writer(chunk_file, delimiter='\t', quotechar='\'')
writer.writerow(header_chunk)
print('File "{}" complete.'.format(chunk_name))
chunk[1] = chunk[1].replace("'", "")
writer.writerow(chunk)