There are four scenarios to be investigated for any spatial patterns within the data. The first and second scenario involves Fort Worth Fire Department Emergency Medical Services (EMS) calls for Battalion 2, if there is any clustering in the calls and any clustering in high value calls. The third scenario investigates the relationship of each feature with other features. The fourth scenario investigates the density of patrons per grid cell and the distance where the patrons cluster from the Oleander Library.
The first scenario data is from Fort Worth Fire Department Feb 2015 calls, the second, third use the Jan 15 calls data and the fourth uses data from the Oleander Library. ArcMap is used in all scenarios. The first scenario finds the Average Nearest Neighbor or average distance for a number of neighbors (say 7), clustering by distance location. The second scenario finds the G Index or distance at maximum z score, distance of greatest clustering by value (high value calls). The third scenario finds the Ripley’s K or distance for the significant cluster, clustering by distance location like Average Nearest Neighbor. The fourth scenario uses the Moran’s I to find clustering using both attributes and locations.
The first scenario used Average Nearest Neighbor to find Nearest Neighbor Index and z score, cautioning to set the area of study first. The z score would indicate if there was clustering in the data. The second scenario used Calculate Distance band to find average distance band, this will help with setting Distance Band field in the High/Low Clustering tool. The z score is generated from the High/Low Clustering. A table is created to document each distance band, G Index and z score. From the table, find the maximum z score. The third scenario uses the Multi-Distance Spatial Clustering Analysis tool to find the K index for 99 permutations and output a table for expected K value, observed K value, lower and upper limit of confidence. A graph is created for the difference between observed K and upper limit of confidence. The greatest difference indicated significant clustering at that distance. The fourth scenario first overlay grid cells over the data using Spatial Join and then ran the Moran’s I tool to look for clustering of the density values for a range of distances. The significant clustering would be at the maximum z score at that distance.
In data mining at the GIS level, statistics concepts are used to detect patterns in data spread over an area or location. This is extremely useful if investigating for spatial autocorrelation in the data collected.
Problem Description In my research of Brownfields in Charlotte, I need to find if there is spatial autocorrelation in the acidity of soil in Brownfields and use that information to find the spread of acidity across land parcels and water sources such as creeks, streams and watersheds in the county. I am required to find the relationship of acidity of Brownfields to contamination of water sources.
Data Needed In my previous research where I collected soil samples of Brownfields in the Department of Environmental Quality listing. The samples were tested for pH and geocoded. Now the data can be used for mining.
Analysis Procedures The Spatial Autocorrelation Moran’s I is applied to pH data. The data is ran from 10,000 feet to 50,000 feet at 1000 feet increment. The maximum z score will tell the most significant clustering of the acidity of the land parcels.