By Matthew Martin
In order to find areas of cities that have a smaller accessibility to schools, understanding the spatial distribution of the schools in an area is important. If the goal is to ensure that all areas have equal accessibility, then one might assume that dispersing the schools might be best. Due to this, the topic of school choice and distance to schools is becoming a larger area of study (Mandic). This study aims to analyze the spatial distribution of locations of schools within the Greater Los Angeles Area. This analysis uses methods of point pattern analysis such as the quadrat count, average nearest-neighbor, and Ripley's K Function to assess the pattern of the schools. Do the locations of schools in the Los Angeles Area follow a dispersed, clustered, or random distribution? Can we accept the Null Hypothesis, which states that the locations of schools throughout the greater Los Angeles Area follow a random pattern.
My study area for this analysis is the greater Los Angeles area. Located within the southern coast of California, and the LA and Orange Counties it encompasses a population of about 18 million people according to the 2020 Census. This is a rectangular region that includes large cities like the cities of Los Angeles, Long Beach, Englewood, West Covina, and Beverly Hills. Being a large metropolitan area, there are 2,150 different public elementary, middle, and high schools.
My data includes the locations of all educational institutions from Esri's Federal Data team and the USGS on ArcGIS Online. This data has then been clipped to a drawn rectangle polygon, and then sorted to only use the "schools" point layer. This gives a set of 2,150 points to use in this analysis.
This analysis uses 2,150 points to represent public schools (elementary, middle, and high schools) in the Los Angeles study area. These points will help determine if the locations of schools in the study area are dispersed, clustered, or randomly placed. The extent of this analysis is a polygon rectangle created over the city area. I kept it a rectangle to make the process easier and to reduce the effects of bias along the edges of the study area. it also gives me a set of random points that included different cities and towns in the metro area.
In this analysis, I used a local cartesian projected coordinate system. Since ArcGIS Pro map frames are not created with a cartesian coordinate system. I created a custom projection that replicates the Cartesian Plane. This plane allows the equal distribution of polygons in my fishnet tool and to ensure that all areas of my study area are represented.
The Quadrat Count Method works by dividing the study area into a grid, and then counting the frequency of events in each polygon in the grid. Using the frequency count, the mean variance and the Variance Mean Ratio (VMR) is calculated. If the VMR is equal to 1, then the set of points has complete spatial randomness. If the VMR is greater than 1, then the set of points leans towards clustering. if the VMR is less than 1, then the set of points leans towards dispersion, or equally spaced).
The VMR can be found using the following formula:
Where:
n = the number of total events
x = the number of quadrats
K = the frequency of points in quadrats
X = the number of quadrats for each frequency county
To find the Quadrat count for schools in the LA area, I first broke up my study area into its quadrats using the Create Fishnet tool. I created a fishnet of 10 rows by 10 columns to keep ensure equal area and distances and for simplicity in the calculation.
Then, I used the Summarize Within tool to count the get a sum of points in each quadrat, called Sum Weight. I set the output to a graduated colors symbology to easily see the point density of each quadrat.
Finally, I used the Frequency Tool to create a table of the frequency of points in each quadrat according to the sum weight data from the previous tool.
The Average Nearest Neighbor method is a distance-based method that averages all the distances between each feature to its nearest neighbor. It then compares the average to a hypothetical random distribution. If the Average Nearest Neighbor Value is equal to 1, then it has complete spatial randomness. If the value is less than 1, then the points lean towards clustering. If the value is greater than 1, then the points lean towards dispersion.
According to the ESRI tool reference on Average nearest Neighbor (ESRI):
The Ripley K Method, also called the Multi-Distance Spatial Cluster Analysis, summarizes dependence of a range of distances. It uses a number of permutations that changes the distance being evaluated. It then computes the average number of features nearby each feature. If the average number of neighbors for a specific distance is higher than the total average over the entire study area, then the points lean towards clustering. If lower, then there is dispersion in the points.
This can be calculated using the following formula from ESRI's tool reference:
For this analysis, I chose to use 999 permutations of 50 distance bands and 5 meter increments In order to get a robust calculation
The Result of the Quadrat count method shows significant clustering of the points. The calculated Variance-Mean Ratio is 2.57. Since this value is greater than 1, then the data is clustered.
The results of the Average Nearest Neighbor method shows a clustering of schools in the LA Area. The p-value output was a 0 and the z-score was -14.02. Since this is less than 1, then the method supports the the claim that there is clustering.
Finally, the results of the Ripley K method also shows a clustering of the schools. Since the ObservedK line goes up and over the Upper Confidence Envelope, then there is significant clustering of the points.
The results of my analysis are consistent with each of the three methods used for the point pattern analysis. All three methods of point pattern analysis show that there is clustering of schools within the Los Angeles Study Area. This means that the locations of the schools throughout the study area cluster together, rather than being placed randomly or dispersed evenly. This may suggest that there are some areas in the study area that have more limited access to schools nearby. The quadrat count method showed clear areas in the form of quadrats that show a much higher density of schools in the darker red areas seen after the Summarize Within tool. This then was evident in the results of the quadrat count method VMR, showing a high clustering value. The other two methodologies then agree with the original method. The Nearest neighbor Method resulted in a z-score much lower than -2, being -14. Then, the resulting chart from the Ripley's K method showed the ObservedK result being well outside the Upper Confidence Envelope. The results also make evident that we cannot accept the null hypothesis since the locations of schools are not in a random arrangement.
Limitations in this analysis include the an arbitrary rectangular area that I have created, as well as the scale at which I ran this analysis. I attempted to make my study area easier for the calculations that included many different points, but also to avoid areas of mountainous regions. Since, at this scale I am running the analysis, it was impossible not to include coastal and mountainous areas, then the results could be biased towards clustering. Further study could be done in a smaller area within the large city and a smaller amount of points to see whether the same clustering appears.
“Educational Structures.” ArcGIS, Esri Federal Data, maps.arcgis.com/home/item.html?id=0f98fc66a9fa4f94953718578cbd77a8. Accessed 16 Aug. 2025.
“How Average Nearest Neighbor Works.” How Average Nearest Neighbor Works-ArcGIS Pro | Documentation, ESRI, pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/h-how-average-nearest-neighbor-distance-spatial-st.htm. Accessed 16 Aug. 2025.
Mandic, Sandra, et al. “School choice, distance to school and travel to school patterns among adolescents.” Journal of Transport & Health, vol. 33, Nov. 2023, p. 101704, https://doi.org/10.1016/j.jth.2023.101704.