Geostatistics with Sasquatch

The purpose of this lab was to compare and explore different geostatistical methods using sasquatch data. The are two types of spatial patterns, geographic distribution, which uses locations to determine how the features are distributed, and spatial autocorrelation which determines how attributes of locations are related. These maps were made using 'Hot Spot Analysis' with a 'Fixed Band Distance' of 85km. This method along with 'Nearest Neighbor' and 'Morin's I' will be explored more below.

Download full map PDF here

Download regional map PDF here

After the data was added to the map, it was inspected and checked for the proper spatial reference. Viewing the attribute table showed a field called 'Class' with values 1-5. These values represented the reliability of the sighting. Those with a value of 1 would be like sightings for second or third hand accounts or times over poor lighting. Those with a value of 5 would be sightings such as those that have high quality videos or highly traceable and credible sources. This field gives us a sense of our data and can be used to show the relationships between different features.

Geostatistical Methods

'Average Nearest Neighbor'

This tool calculates the distance to a feature's (usually a point) next nearest neighbor, calculates an average of those distances and then compares that average to the expected (random mean). This comparison provides a ratio that is used to determine whether the data is clustered, random, or dispersed. If the ratio is above 1, the data is considered dispersed where there is an even amount of space between all the points to their nearest neighbors. If the ratio is below 1, the data is considered clustered, where there is an uneven amount of space between the points with some points being close together in clusters. If the ratio was zero it would show a random dispersal.

In ArcGIS and ArcMap you can get a report from the 'Average Nearest Neighbor' tool that provides p-values and z-scores which provides a level of certainty that the data isn't due to random chance. A p-value of .01 says that we are 99% certain that the data is not due to random chance, there is in fact a pattern present within it. Z-scores are used to measure an observation's (in this case points) deviation from the group's mean value. Z-scores reveal to statisticians and scientists whether a score is typical or atypical for a specified data set. A z-score (critical score) between -1.96 and +1.96 says that the data is 1.96 standard deviations above or below the mean and there is a low chance that the value could be picked randomly (anything less than 5% is considered quite significant). This data having a z-score of ~-65 shows that there is a less than 1% chance that the value could be picked randomly.

Figure 4. ArcGIS Pro report using 'Average Nearest Neighbor' tool. A ratio below 1 shows a clustered dataset with the mean distance being less than the expected distance for randomly distributed points. This ratio was found by dividing the 'Observe Mean Distance', in this case ~19064 meters, by the 'Expected Mean Distance', in this case ~47325 meters.

'Global Moran's I'

This tool measures the autocorrelation of the location of the points and an attribute field, in our case the Class field described earlier. The class attribute is what is used to calculate the standard deviation for the set as a whole. Then the same attribute is used to find the deviations of the distance between neighboring points. This gives us Moran's Index. The data is represented as dispersed, random, or clustered based on the index. The index should be between -1 and +1, where the closer to -1 is considered dispersed and closer to +1 is considered clustered. I set the parameters of the tool to a 'Fixed Distance band' to standardize the data. The reason for this is to make sure that every point had at least 8 neighbors and helps with the skewing of data that happens when using the 'Global Moran's I' statistic. Without making sure that every point has enough points, if the input field data you are using is already skewed some data will have very few neighbors and the standard deviations between values will be thrown off when using the statistic.

Figure 5. ArcGIS Pro report from 'Global Moran's I'' tool. The index is computed by taking the mean and variance for the attribute being evaluated. Then, for each feature value, it subtracts the mean, creating a deviation from the mean. The index is closer to 1, showing a clustered dataset.

'Hot Spot Analysis'

This tool measures the autocorrelation of the location of the points and an attribute field, in our case the Class field described earlier. The class attribute is what is used to calculate the standard deviation for the set as a whole. Then the same attribute is used to find the deviations of the distance between neighboring points. This gives us Moran's Index. The data is represented as dispersed, random, or clustered based on the index. The index should be between -1 and +1, where the closer to -1 is considered dispersed and closer to +1 is considered clustered. I set the parameters of the tool to a 'Fixed Distance band'. The reason for this is to make sure that every point had at least 8 neighbors and helps with the skewing of data that happens when using the 'Global Moran's I' statistic. Without making sure that every point has enough points, if the input data is already skewed some data will have very few neighbors and the standard deviations between values will be thrown off when using the statistic.

Method Comparison

Table 1. This tables showcases some of the differences in p-values and z-scores depending on which statistical test is used on the data. You'll notice particularly with the Hot Spot Analysis, that kilometer distance the z-score and p-value are not particularly significant overall, but there were some points when reviewing the maps that were considered significant individually. This is because I didn't choose a large enough band distance to have the points calculated properly. A better choice would have been around the 1000 km range to provide more neighboring points for each point.

Page updated

Google Sites

Report abuse