Project 2

About the Data

Data Source:

Two datasets were used to build the application. Both datasets were collected from Chicago Data Portal.

Dataset that contains information about all the CTA L stops including their latitude and longitude can be found at: https://data.cityofchicago.org/Transportation/CTA-System-Information-List-of-L-Stops/8pix-ypme. The file size is 48KB.
Ridership data of all the CTA L stations can be found at: https://data.cityofchicago.org/Transportation/CTA-Ridership-L-Station-Entries-Daily-Totals/5neh-572f. The file size is 39MB.

Data Usage:

The CTA L stops information data provides location and basic service availability information for each place on the CTA system where a train stops, along with formal station names, stop descriptions, and line colors (RED, BLUE, G (Green), O (Orange), BRN (Brown), P (Purple), Pexp (Purple Express), Y (Yellow), and Pnk (Pink)). DIRECTION_ID refers to the normal direction of train traffic at a platform (E - East, W- West, N - North, S - South). STOP_ID is a unique identifier for each stop and MAP_ID is a unique identifier for each station. ADA column tells if the stop is ADA (American’s with Disability Act) compliant.

Table 1: CTA - System Information - List of 'L' Stops

The ridership data contains entries of daily rides entries of all the CTA stations in Chicago starting 2001 to 2021. The dataset shows entries at all turnstiles, combined, for each station. Daytypes are as follows: W = Weekday, A = Saturday, U = Sunday/Holiday.

Table 2: CTA L Station Ridership Data

The free web-based version of the Shiny server that was used to publish this project has a limit of 5 MB for each data file. Thus, we split the ridership data file (39 MB) into smaller pieces to be able to upload it. Python script used for splitting the ridership data and creating the TSV files is provided below:

#!/usr/bin/env python3

import csv

import os

import sys

os_path = os.path

csv_writer = csv.writer

sys_exit = sys.exit

if __name__ == '__main__':

# number of rows per file

chunk_size = 130000

# file path to master tsv file

file_path = "C:/Users/Akash/UIC/CS 424/tsv_splitter/CTA_-_Ridership_-__L__Station_Entries_-_Daily_Totals.tsv"

if (

not os_path.isfile(file_path) or

not file_path.endswith('.tsv')

print('You must input path to .tsv file for splitting.')

sys_exit()

file_name = os_path.splitext(file_path)[0]

with open(file_path, 'r', newline='', encoding='utf-8') as tsv_file:

chunk_file = None

writer = None

counter = 1

reader = csv.reader(tsv_file, delimiter='\t', quotechar='\'')

# get header_chunk

header_chunk = None

for index, chunk in enumerate(reader):

header_chunk = chunk

header_chunk[0] = header_chunk[0][1:]

break

for index, chunk in enumerate(reader):

if index % chunk_size == 0:

if chunk_file is not None:

chunk_file.close()

chunk_name = '{0}_{1}.tsv'.format(file_name, counter)

chunk_file = open(chunk_name, 'w', newline='', encoding='utf-8')

counter += 1

writer = csv_writer(chunk_file, delimiter='\t', quotechar='\'')

writer.writerow(header_chunk)

print('File "{}" complete.'.format(chunk_name))

chunk[1] = chunk[1].replace("'", "")

writer.writerow(chunk)

Page updated

Google Sites

Report abuse