In this exercise there were four unique scenarios that require four unique processes related to cluster analysis. In the first, the fire department would like to know if false alarm calls are clustering so that they can focus their education efforts in that area. In the second scenario the same fire department would like to investigate if their high priority calls are clustering and if so at what distance. In the third scenario they would like to know what the relationship is from one feature to all other features at various distances. And lastly, the Oleander Library would like to know where their patrons are clustered.
Strategies
To address the problem put forth by my client, ESRIs ArcMap was used to perform the cluster analysis. Various tools from the analyzing patterns tool set were employed including Average Nearest Neighbor, High/Low Clustering (Getis-Ord General G), Spatial Autocorrelation (Global Moran’s I), and Multi-Distance Spatial Cluster Analysis (Ripley’s K). The data used included (fire station) battalion boundary layers, EMS Call data for Feb15 & Jan 15, and existing and proposed station points for the work on the fire station calls and for the library exercise the Patron Locations, a boundary layer for “districts,” the major roads, and the location of apartment complexes.
Methods
Average Nearest Neighbor
I started by calculating the square footage of the Battalion 2 boundary polygon. I then created a definition query to include only incidents classified as false alarms (coded 700-745). I then performed the average nearest neighbor calculations inputting the square footage of the boundary layer, and the queried incident layer as the input.
Results
High/Low Clustering (Getis-Ord General G)
First I created a definition query to make sure only joined the grid layer to the input feature (in this case the library patron locations). Then I created a definition query to make sure that only squares with values over 0 were included in the analysis and then I ran the spatial autocorrelation on the output until I was able to determine the highest z-score.
Results
Multi-Distance Spatial Cluster Analysis (Ripley’s K)
Using the January 15 call data I applied the Multi-Distance Spatial Cluster Analysis (Ripley’s K Function) using 10 distance bands, weighting the FEE (call priority code) field, a confidence envelope of 99, beginning distance of 200 ft. and a distance increment of 100. The study area method was set to USER_PROVIDED and the study area feature class to the battalion boundary layer. A field was added to the attribute table and the difference was calculated by subtracting the upper limit of the confidence envelope and the observed K. A graph of the results was created to identify the distance of maximum clustering. A vertical line graph was created with ExpectedK on the x axis and the calculated difference from the last step on the y axis to determine the distance of maximum clustering.
Results
Spatial Autocorrelation (Global Moran’s I)
First I looked at a visible cluster and zoomed in to see how many “neighbors” were in the cluster. I then ran the Calculate Distance Band from Neighbor Count tool with that neighbor number (in this case 7). I used the service call data for the input feature class. I then ran the High/Low Clustering (Getis-Ord General G) tool using the same input feature class and the FEE field (which represents the call’s priority). I used fixed distance bands starting with 1000 which was the distance calculated by the Calculate Distance Band from Neighbor tool. I then ran the same report from four more times at distances that increased by 200 ft each time (so 1200, 1400, 1600, 1800 ft.) and created a table with the G Index and Z-score results. The peak/highest z-score revealed the distance at which maximum clustering of high priority calls occurred (in this case 1400 ft).
Results
In this assignment, I learned how to do various cluster analyses. I have previously used a less sophisticated version of this type of analysis to look for clusters of poorly performing public schools. I first loaded the NCDPI (North Carolina Department of Public Instruction) provided school performance data after adding location data. Then I overlaid the resulting points which were symbolized by school report card letter grade onto a poverty map of North Carolina. There was no discernible pattern or correlation so I then switched to a map that was done by racial demographics and there was a clear (visual) pattern that seemed (by visual inspection) to indicate clusters of poorly performing schools in communities of color. I then performed a cluster analysis and discovered that it was not just my eyes…there are in fact dense clusters of poorly performing schools in communities of color in this state. You could also use this type of analysis in mineral and resource speculation, evaluation of endangered or rare plant and animal communities, or to obtain valuable information about customer locations or to help determine where services (either public or private industry) need to be distributed.