The purpose of this lab was to compare and explore different geostatistical methods using sasquatch data. The are two types of spatial patterns, geographic distribution, which uses locations to determine how the features are distributed, and spatial autocorrelation which determines how attributes of locations are related. These maps were made using 'Hot Spot Analysis' with a 'Fixed Band Distance' of 85km. This method along with 'Nearest Neighbor' and 'Morin's I' will be explored more below.
Download full map PDF here
Download regional map PDF here
Download regional map PDF here
After the data was added to the map, it was inspected and checked for the proper spatial reference. Viewing the attribute table showed a field called 'Class' with values 1-5. These values represented the reliability of the sighting. Those with a value of 1 would be like sightings for second or third hand accounts or times over poor lighting. Those with a value of 5 would be sightings such as those that have high quality videos or highly traceable and credible sources. This field gives us a sense of our data and can be used to show the relationships between different features.
This tool calculates the distance to a feature's (usually a point) next nearest neighbor, calculates an average of those distances and then compares that average to the expected (random mean). This comparison provides a ratio that is used to determine whether the data is clustered, random, or dispersed. If the ratio is above 1, the data is considered dispersed where there is an even amount of space between all the points to their nearest neighbors. If the ratio is below 1, the data is considered clustered, where there is an uneven amount of space between the points with some points being close together in clusters. If the ratio was zero it would show a random dispersal.
In ArcGIS and ArcMap you can get a report from the 'Average Nearest Neighbor' tool that provides p-values and z-scores which provides a level of certainty that the data isn't due to random chance. A p-value of .01 says that we are 99% certain that the data is not due to random chance, there is in fact a pattern present within it. Z-scores are used to measure an observation's (in this case points) deviation from the group's mean value. Z-scores reveal to statisticians and scientists whether a score is typical or atypical for a specified data set. A z-score (critical score) between -1.96 and +1.96 says that the data is 1.96 standard deviations above or below the mean and there is a low chance that the value could be picked randomly (anything less than 5% is considered quite significant). This data having a z-score of ~-65 shows that there is a less than 1% chance that the value could be picked randomly.
This tool measures the autocorrelation of the location of the points and an attribute field, in our case the Class field described earlier. The class attribute is what is used to calculate the standard deviation for the set as a whole. Then the same attribute is used to find the deviations of the distance between neighboring points. This gives us Moran's Index. The data is represented as dispersed, random, or clustered based on the index. The index should be between -1 and +1, where the closer to -1 is considered dispersed and closer to +1 is considered clustered. I set the parameters of the tool to a 'Fixed Distance band' to standardize the data. The reason for this is to make sure that every point had at least 8 neighbors and helps with the skewing of data that happens when using the 'Global Moran's I' statistic. Without making sure that every point has enough points, if the input field data you are using is already skewed some data will have very few neighbors and the standard deviations between values will be thrown off when using the statistic.
This tool measures the autocorrelation of the location of the points and an attribute field, in our case the Class field described earlier. The class attribute is what is used to calculate the standard deviation for the set as a whole. Then the same attribute is used to find the deviations of the distance between neighboring points. This gives us Moran's Index. The data is represented as dispersed, random, or clustered based on the index. The index should be between -1 and +1, where the closer to -1 is considered dispersed and closer to +1 is considered clustered. I set the parameters of the tool to a 'Fixed Distance band'. The reason for this is to make sure that every point had at least 8 neighbors and helps with the skewing of data that happens when using the 'Global Moran's I' statistic. Without making sure that every point has enough points, if the input data is already skewed some data will have very few neighbors and the standard deviations between values will be thrown off when using the statistic.