These tutorials will show you how to take the data on stop times and frequencies contained within a GTFS feed and structure it in a form where it can be joined to spatial data and visualized in a GIS program. These steps in Microsoft Excel will aggregate the data at either the stop or route level, which can then be joined to point or line files in QGIS. This process can also be done in Google Sheets.
Download a static GTFS feed to work with. Follow the instructions for your respective operating system to unzip the folder, and use text-to-columns to import the following .txt files into their own Excel worksheets in one workbook:
In the tabs at the bottom of the Excel window, rename each worksheet so it reflects the name of the .txt file it represents.
*Note: these are not required files for this process, and will also not show up in every feed. However, if they do exist, they are helpful to have on hand to understand service exceptions.
Open the Calendar worksheet to determine the agency’s service periods. For each day of the week, 1 = service and 0 = no service. This has a very simple calendar - service_id 65974 represents weekday service, service_id 65975 represents Saturday service, and the agency does not operate Sunday service.
Some calendars may show up in Calendar.txt with a value of 0 for every weekday - these are typically holiday or special event calendars, which are in Calendar_dates.txt.
Open the Stop_Times worksheet, select all columns, and set a filter. Sort trip_id smallest to largest. Then, create a new column to the right of trip_id, and name it service_id.
Do a VLOOKUP in the service_id column to add the service_id to stop_times (detailed instructions here). The syntax should look roughly like what is shown in the image to the right. This step adds the service ID of each trip to the Stop_Times worksheet from the Trips worksheet, which will now allow you to filter each trip by which calendar it appears on (e.g. Weekdays, Saturdays, etc).
Select the entire Stop_Times worksheet, and go to Insert → Pivot Table to create a PivotTable in a new worksheet. In the Pivot Table fields section on the right side of the window, drag the following fields into these respective boxes:
Rows - stop_id
Values - Count of arrival_time
Columns - arrival_time. Once you drag that in, the hours field should auto-populate as long as the arrival_time column is properly formatted.
Filters - service_id. This will let you filter trips by calendars.
These steps should give you a table that looks like the one below.
Click on the service_id filter in the top-left corner of the window to select which calendars show up in the table. On this feed, the service_id value 65974 represents Weekday service, so with this filter applied, only weekday trips show up in the Pivot Table.
Select-all, then copy-paste the filtered table into a new worksheet and Paste Values Only. Remove extraneous rows from the top so that the topmost column has the time blocks, and delete any extraneous or empty columns as well. Be sure to re-add in “stop_id” as the header for Column A, and rename the hours columns to remove the spaces in the headers as well! Your final result should look like the table below.
If you would like to aggregate hourly frequencies into larger groups (e.g. AM Peak period from 7-9am, etc), this can be done here by summing up the applicable groups of hours and then deleting the individual hour columns. Below is the same table as above but with frequencies grouped into the following categories: Early AM (4-6am), AM Peak (7-9am), Midday (10am-1pm), Early PM (2-4pm), PM Peak (5-7pm).
How you aggregate hourly frequencies is entirely up to you, and how you plan to visualize the data. Hourly frequencies can also be aggregated using QGIS or ArcMap instead.
Save each worksheet that you want to join to the Stops layer and map as a separate CSV UTF-8 (Comma Delimited) file. Note that each CSV file can only contain a single worksheet.
If you plan to map multiple calendars (e.g. both weekday and saturday service), the data for each calendar will need to be saved in a separate CSV file. For each calendar, return to Step 4 and select the applicable service_id for the calendar you wish to map, then repeat Steps 5-7.
The process for aggregating route-level frequencies is very similar to the stop-level tutorial, but there are some key differences and other caveats to note:
Not all GTFS feeds will contain shapes.txt. If the feed is missing shapes.txt but you have access to a separate shapefile of the agency's route alignments, you can still create the CSV outlined below and
Carry out Steps 1-3 of the Stop-Level Frequencies tutorial, but make sure to import shapes.txt and routes.txt into a separate worksheet in Step 1 as well.
Open the trips.txt worksheet, select-all, and create a new Pivot Table. Assign the following fields as such:
Rows - shape_id or route_id*
Values - Count of trip_id
Filters - service_id . This will let you filter trips by calendars.
The table on the right shows weekday frequencies aggregated at the route level.
*NOTE: This step lets you decide whether you want to aggregate frequencies at the shape or the route level. If you’re interested in seeing how frequencies change across different variants of the same route, choose shape_id. However, if you want all trips to be aggregated at the route level instead, choose route_id.
Select-all, then copy-paste the filtered table into a new worksheet and Paste Values Only. Remove extraneous rows from the top so that the topmost column has the time blocks, and delete any extraneous or empty columns as well. Be sure to re-add in route_id as the header for Column A, and rename the count of trip_id column to tripcount (or something similar). The final result will look like the image on the right.
Save each worksheet that you want to join to the Shapes layer and map as a separate CSV UTF-8 (Comma Delimited) file. Note that each CSV file can only contain a single worksheet.
Before you import the route-level frequencies table into QGIS or ArcMap, you’ll need to create the CSV described below, which is what connects route_id to shape_id, since the route_id field does not appear in shapes.txt. The route_id is the name of the route itself, while the shape_id refers to the particular alignment or path that the route takes on a given run. Since routes may have multiple variations, each route may have more than one corresponding shape.
Shapes.txt contains individual lat-long point coordinates, which are put into order in trip planning apps and GIS programs using the shape_pt_sequence field. But first, we'll need to isolate the unique shape_id values.
Generate a list of all unique values for shape_id. One way to do this is to create a duplicate worksheet of shapes.txt. In the duplicate worksheet, select all values, then go to Data -> Remove Duplicates. Uncheck all boxes in the subsequent menu except shape_id. Then, delete all columns except shape_id. Rename this worksheet Shapes2Routes.
Open the worksheet containing trips.txt. Take the shape_id column and copy-paste it to the leftmost-column on the sheet (so it has the position of Column A).
Back in the Shapes2Routes worksheet, create a VLOOKUP in Cell B2 that refers to the shape_id and route_id columns in Trips.txt. This will populate the route_id column in Shapes2Routes, which now formally links the two ID fields.
If you wish to include the actual route names in this file, do another VLOOKUP to pull route_short_name and/or route_long_name from Routes.txt into Shapes2Routes.
Having the actual route names in the file will be useful for map labeling purposes in QGIS, but is not required to join frequency data to shapes.txt.
Save Stops2Routes as a CSV UTF-8 (Comma Delimited) file. This file will be imported into QGIS in the next tutorial.