Trail Network Analysis
Travis Zalesky
As part of UA GIST 602B
Travis Zalesky
As part of UA GIST 602B
Figure 1. Olympic National Park and National Forest study area, with park features and locator map.
Network analysis is a powerful and efficient means of processing geospatial data. Based on graph theory, it has been used to great effect in recent decades, particularly for road networks. Despite their proven utility, network datasets can be large, complicated, and difficult to establish and maintain. For this reason, there are still many applications in which geospatial network analysis is underutilized. One of these underutilized applications is trail networks. Many smaller, publicly funded, parks departments may lack the resources to establish and maintain a dedicated trail network dataset. Using a case study of the Olympic National Park, I will demonstrate that it is possible to establish a simple trails network dataset, using readily available data sources, and explore its utility, using a series of hypothetical park management questions.
Network analysis is possibly one of the most powerful tools in any GIS program. Based on graph theory, a network is a simplified structure consisting of nodes (vertexes), and edges (lines), which represent connectivity in a system. There are many examples of networks in daily life, but probably the most familiar geospatial network, for most people, is a roads network. Roads, here, represent edges, which you can travel along, and intersections represent nodes, which connect various roads together. Importantly, every edge in a network must have some cost to "travel" along that edge. In our roads example, this may simply be the distance of the road segment, between each node. Commonly, this distance will be expressed in travel time, by dividing distance by expected velocity. Alternatively, this cost could be some other factor of interest, like the monetary cost of toll roads, a "walkability rating", or any other number of quantifiable metrics that may be of interest to the end user. Networks are an efficient way to find the shortest path, optimize routes, and calculate distances in the real world.
The process of building a complex geospatial network, such as a global roads network, is involved, and time-consuming. However, if you have ever used an app to get driving directions, you have benefited from network analysis. Thankfully, much of the hard work of building road networks has already been done by companies such as Google and Esri. However, there may still be instances where there is no pre-existing network for the analysis you want to perform. A good example of this is trail networks.
In this analysis, I will build a trail network of the Olympic National Park and National Forest using a composite of trail maps and forest road maps, and I will use this network to answer three hypothetical questions that would likely be of interest to hikers and park managers (Fig. 1).
First, park managers have decided to sponsor a local hiker to do a long-distance hike through the Olympic National Park as part of a social-media campaign strategy. Long distance hiking and backpacking has seen an explosion in the public consciousness in recent years. Dozens of critically acclaimed, best-selling, books about famous through hikes such as the Pacific Crest Trail and the Appalachian Trail have been hitting bookstores, and permit sales for these trails is at an all-time high. Park managers would like a hiker to hit 6 premier park destinations, in the order of their choice, (1) Lake Crescent, (2) the Hoh Rainforest, (3) the Elwha River, (4) Lake Cushman, (5) Quinault Lake, and (6)Hurricane Ridge. The hiker needs to know the optimal route through the park, to these six locations. They also need to know the optimal order of stops, the total distance traveled, the estimated time to complete the hike, and turn-by-turn directions.
Next, park managers need a simplified means of knowing all the possible connections between campsites and access points, as well as trail distances between each point. The Olympic National Park is a maze of trails and roads. There are over 100 campsites within the park, and there are 56 official trailheads where hikers and backpackers can enter the trail network. How can all these relationships be summarized in a simple lookup table?
Finally, park managers are concerned about the safety of hikers and backpackers. Backcountry rescue is an unfortunate reality within the park, and lost or injured hikers are a serious concern. Assuming that search and rescue teams can rendezvous at the nearest trailhead, park manages want to know which sections of trail are within 1, 2, 4, and 6 hours of reaching stranded hikers. Hikers beyond the six-hour window may require multi-day efforts to exit the park, and additional resources such as helicopter rescue may be warranted.
Data for this analysis was acquired from a variety of publically available sources. All layers used were acquired via ArcGIS Online, and metadata, including data sources, are summarized in Table 1.
All data acquisition and analysis was performed in Esri ArcGIS Pro 3.2.2, utilizing the Network Analysis extension, unless otherwise stated.
Olympic National Park and Forest were subset from the USA Parks layer. Roads were clipped from the WADNR Active Roads layer to only those intersecting the park boundary. Trailheads and campsites (including primitive campsites) were each subset from the Olympic National Park Features layer.
The roads layer was reprojected into the State Plane WA N projection, to match the Wild Olympic Compiled Trails (trails) layer. Both the roads and trails layers had additional fields "Speed_mph", "Time", and "Name" added to each attribute table. For each new field, values were calculated. Average hiking speed was determined for trails and roads as 2 mph, and 3 mph respectively. Time was calculated as the shape length (US feet, converted to miles) divided by the hiking speed. Finally, Name was calculated as either the "Trail Name" or "Unique Route ID" (roads), to provide a standard naming scheme for the directions utility.
The trails layer was merged with the roads layer to create a single roads-trails layer. Due in part to the multiple sources for this layer, there were found to be numerous topology errors between the two data sets. Initially, the Esri "Snap" tool was used to connect feature ends/edges together using a snap distance of 100 US feet. However, this did not solve all the topology issues, and dozens of roads/trails were connected by manually editing the feature vertices, with the help of comparison to basemap layers. Additionally, initial network analysis helped to identify topology errors, and vertices were adjusted, as warranted, throughout the exploratory phase of analysis. While every effort was made to correctly connect as many network edges as possible, it was not possible to test every edge. It is assumed that some topology errors remain, particularly amongst the densely packed forest roads.
All analysis and cartography was performed in the NAD 1983 (2011), State Plane WA N (U.S. ft; EPSG:6597) projection. This is an E-W, Lambert, conical projection which contains the majority of the study area, and which limits distance distortions in this analysis.
A new network dataset (ND) was created from the roads-trails layer, using an "Any Vertex" network connectivity policy (i.e., any line crossing is considered a node, rather than endpoints only). There were no vertical connectivity attributes applied, as it is assumed that all trail surfaces are at ground level, and there are no bridge or tunnel like features present. For edges, both a distance and a time cost were applied using a field script referencing the roads-trails attribute table fields "Shape_Length" (U.S. ft), and "Time" (hours) respectively. All edges are bidirectional, meaning there is no cost difference between travel direction. A custom travel mode "Hiking" was created, based on a modified "Walking" mode with both distance and time cost attributes. The turn by turn directions feature was enabled, using the roads-trails attributes table "Name" field as the base name for field mapping. The network layer was built.
Figure 2. Workflow demonstration for adding stops to the ESRI Route Solver tool. Only the six manually-selected trailheads were added, and each stop location was snapped to the network using a snapping distance of 100 U.S. ft.
Using the Network Analyst package, a new Route Solver feature dataset was created, using the built ND.
Six, trailheads were identified corresponding to the six premier park destinations that park managers would like a long-distance hiker to visit. The six trailheads were manually selected from the trailhead layer, before being imported as stops to the Network Analysis Route tool. Each stop was snapped to the network, using a snapping tolerance of 100 U.S. ft (Fig. 2).
The Route tool was run with the custom hiking transportation mode, with both distance and time costs calculated for the route. The find best sequence option was enabled, which does not preserve the stop input sequence, and allows for any beginning and end location, however each stop will only be visited once. The output geometry was set to "along network" for visually pleasing cartography, and the turn by turn directions feature was turned on.
Figure 3. Workflow demonstration for adding origins to the ESRI Origin-Destination Matrix tool. Each origin location was snapped to the network using a snapping distance of 100 U.S. ft.
In ArcGIS Pro, a new Origin-Destination (OD) Matrix Solver feature dataset was created using the built ND.
All 56 trailheads were imported as origins (Fig. 3), and all 118 campgrounds were imported as destinations to the OD Matrix tool (Fig. 4). All origin and destination points were snapped to the network, using a snapping tolerance of 100 U.S. ft.
The OD Matrix tool was run, with the hiking transportation mode and both distance and time costs calculated for each route. There was no maximum cutoff distance, or maximum number of destinations set, so that all possible connections would be found. The output geometry was set to straight lines, so that connections were not constrained to the actual trail path.
The final OD Matrix was exported as a .csv file. Additionally, the trailhead and campsite names were exported as a .csv file. All three .csv files were imported into R v4.3.2 (R Core Team, 2021) with the package "dplyr" 1.1.4. The trailheads and campsites were assigned an ID number corresponding to their index position, and these IDs were used to join the trailhead and campsite names to the OD Matrix for additional readability and utility of the output matrix.
Figure 4. Workflow demonstration for adding destinations to the ESRI Origin-Destination Matrix tool. Each destination location was snapped to the network using a snapping distance of 100 U.S. ft.
In ArcGIS Pro, a new Service Area Solver feature dataset was created using the built ND.
All 56 trailheads were imported as facilities to the Service Area tool. All facilities were snapped to the network using a snapping tolerance of 100 U.S. ft (Fig. 5).
The Service Area tool was run, with the hiking transportation mode, calculating both distance and time costs. The direction was set as away from the facilities, and the cutoffs were set to 1, 2, 4, and 6 hours. The output geometry was set to split, lines, so that the output would follow the trail paths, and in the case of overlapping output lines, the line would be automatically assigned to the facility (trailhead) with the least time cost.
Figure 5. Workflow demonstration for adding facilities to the ESRI Service Area tool. Each Facility location was snapped to the network using a snapping distance of 100 U.S. ft.
The final, built, ND had a total of 23,408 edges, with 10,886 nodes. That is not to say that there were > 23K trails in the network, as most of these edges only represent short trail segments between adjacent nodes.
The optimal route for the long-distance hike to all six of the target destinations is a 183-mile route that is estimated to take 83 hiking-hours to complete. The optimal path was deemed to be from (1) Lake Cushman to (2) Quinault Lake, then to (3) Hurricane Ridge, (4) the Elwha River, (5) Lake Crescent, and ending at (6) the Hoh Rainforest (Fig. 6). Turn by turn directions for the desired route are detailed in Figure 7, featuring 76 turns.
Figure 6. The optimal route to six premier destinations in the Olympic National Park, with each destination ordered numerically.
Figure 7. Turn by turn directions for the route shown in Figure 6.
The Origin-Destination Matrix tool returned a very large data table with more than 2,500 identified connections from trailheads to campsites. The mapped output geography was extremely cluttered, however a small sample of the mapped output is presented in Figure 8, where you can see that each trailhead is connected to many campsites throughout the study area. The more useful output of the analysis is the data table output, which has been modified from the original ESRI table format (Table 2). This data table can be easily incorporated into a database or read by advanced programming languages, which can enable advanced functionality such as querying, and statistical analysis.
Trails were evaluated at four levels of hiking time from the nearest trailhead, 1, 2, 4, and 6 hours (Fig. 9). While most locations along the trail network are accessible on foot in six hours or less, there are notably many areas, particularly in the center of the park, which may not be reachable by a rescue team (on foot) within a single day. It may be advisable to inform hikers intending to enter this area that rescue efforts may be substantially delayed, and that they should proceed with the utmost caution.
Importantly, while every effort has been made to correct topology errors and connect trails, especially through the park center, there are clearly a number of topology errors that are still effecting the forest roads network. This is particularly evident in the NW corner of the park, where much of the road network is shown to be outside the six-hour maximum time limit, despite trailheads being quite close.
Figure 9. Service area map from each trailhead at four levels of estimated time cost to reach locations throughout the trail network. Thin black lines indicate trails that are either, greater than 6 hours from a trailhead, or, are disconnected from the trail network through a topology error.
Although geospatial NDs can be extremely complex and difficult to establish, their utility is nearly unmatched. Fortunately, many valuable NDs are already being provided by companies such as ESRI and Google, particularly for road networks. Network analysis can be used to find an optimal route, provide turn by turn directions, find connections, and map service areas, just to name a few applications.
The trails ND used in this analysis is, in truth, a very simple ND. Despite having more than 10K nodes, it is still rather small compared to a major cities road's ND, which could easily reach into the 100s of thousands of nodes. Additionally, edges can be uni-directional, or asymmetrical, there could be a variety of travel modes with differing costs or travel restrictions, and changing travel conditions can be modeled in real time. Despite the relative simplicity of my trail ND, it was not easily made. The source data layers that I obtained, while convenient, were made by different authors, and not built for network analysis purposes. Substantial pre-processing and error correction had to be undertaken prior to analysis, and exploratory analysis had to be repeated numerous times to identify the locations and nature of errors, until a satisfactory result was obtained. Despite these challenges and limitations, I have been able to successfully answer three hypothetical questions from park management.
Thanks to use of route optimization, our hypothetical endurance hiker has identified their route, and has turn by turn directions in hand. The optimal route will start at Lake Cushman and end at the Hoh Rainforest. The total distance will be just over 180 mile, and is expected to take 83 hiking-hours. Our hiker is experienced and in excellent hiking condition, and they expect to average about 6 hours of travel per day, meaning that they should expect to be on the trail for 2 weeks. Fortunately for them, the Park Rangers have agreed to resupply them at each of their pre-determined stops.
By utilizing the Origin-Destination Matrix tool, I have efficiently identified thousands of trailhead to campsite connections, including trail distance, and estimated travel times between each. This table holds a vast amount of information, which park managers and users can easily query to answer any number of travel related questions.
Finally, by calculating the service areas from each trailhead, hikers and park officials can be made aware of the least accessible regions of the park, where rescue efforts may be substantially delayed. This information might help backpackers better understand their level of risk and responsibility, or it might encourage park managers to assess alternate modes of transport for search and rescue personnel.
Future improvements to this ND might include the addition of slope and/or change in elevation as a factor in the hiking speed, or the inclusion of real time weather data effecting trail conditions. Additionally, further work could be directed towards reducing the number of topology errors in the underlying dataset and connecting the main park region with the beach trails to the West.
R Core Team. (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr
Esri. (2021). Usa Parks. https://www.arcgis.com/home/item.html?id=f092c20803a047cba81fbf1e30eff0b5
ruehmann. (2021). Olympic National Park Features. https://www.arcgis.com/home/item.html?id=aae97edea3d3493192e2eb156af141c5
OA_Admin. (2023). Wild Olympic Compiled Trails. https://www.arcgis.com/home/item.html?id=24ea37ce1f82490abc86b7bfdbd26c25
Washington State Department of Natural Resources. (2019). WADNR Active Roads. https://www.arcgis.com/home/item.html?id=bfdb0455c3b24aa6ae46c9502f814c25