Determining Clustering using R

Task of Lab

The objective of this lab was to determine if maple trees within a study area in Lansing, Michigan are clustered or not using Kernel Density Estimation (KDE), Ghat, and Fhat functions in R.

Methodology

Code for each step is below:

Imported my CSV dataset of maple tree coordinates into R.
Found the Mean Center and plotted the Standard Distance as a circle centered at the Mean Center.
Performed a KDE estimation for my maple tree dataset.
Found the mean nearest neighbor distance for the maple tree dataset and presented the distribution of nearest neighbor distances as a histogram.
Determined Ghat for the maple tree dataset to find evidence of clustering.
Determined Fhat for the maple tree dataset to find more evidence of clustering to prove robustness.

R Packages and Functions Used

Base R Functions:

bw.nrd0() - chooses the bandwidth of a Gaussian kernel density estimator using Silverman's "rule-of-thumb" formula
c() - combines arguments to form a vector
cbind() - takes a sequence of vector, matrix, or dataframe arguments and combines them by column
hist() - computes a histogram of the defined values in a vector
image() - creates a grid of colored or gray-scale rectangles with colors corresponding to the values of the input
lines() - joins coordinate points with line segments
max() - returns the maximum value of the input arguments
mean() - generic function for the arithmetic mean
min() - returns the minimum value of the input arguments
plot() - draws a scatter plot with axes and titles in the active graphics window. A line plot can be set by denoting the "type" as "l" (lowercase L)
read.csv() - reads a csv file and creates a data frame from it
seq() - generates a sequence of numbers from a defined minimum to a defined maximum by a defined interval
summary() - provides a statistical summary for the input vector
symbols() - draws symbols on a plot

sf Package Functions:

st_as_sf() - converts a foreign object to an sf object

sfdep Package Functions:

std_distance() - measures the distance away from the mean center of a point pattern

splancs Package Functions:

areapl() - calculates the area of a polygon
as.points() - creates data in spatial point format ([x,y] coordinates)
csr() - generates completely spatially random points on a polygon
Fhat() - calculates an estimate of the F nearest neighbor distribution function
Ghat() - calculates an estimate of the G nearest neighbor distribution function
kernel2d() - performs quartic kernel smoothing on a point pattern
nndistG() - calculates nearest neighbor distances as used by Ghat
npts() - returns the number of points in a data set

Code

Step 1 Code:

> Maple<-read.csv("C:\\Users\\...\\Lab_data\\data\\dataexer\\lansing_maple.csv",header=T)

Step 2 Code:

> mean_x<-mean(Maple$x)

> mean_y<-mean(Maple$y)

> mean_center<-c(mean_x,mean_y)

> mean_center

[1] 0.5512160 0.3804669

> sf_Maple<-st_as_sf(Maple,coords=c("x","y"))

> std_dist<-std_distance(sf_Maple)

> std_dist

[1] 0.3566504

> symbols(mean_x,mean_y,circles=std_dist)

Step 2 Plot: Plot showing the standard distance as a circle centered at the mean center

Step 3 Code:

> Maple.pts<-as.points(Maple)

> minx=min(Maple.pts[,1])

> maxx=max(Maple.pts[,1])

> miny=min(Maple.pts[,2])

> maxy=max(Maple.pts[,2])

> Polygon=cbind(c(minx,maxx,maxx,minx),c(miny,miny,maxy,maxy))

> Maple_Band<-bw.nrd0(Maple.pts)

> Maple_kernel<-kernel2d(Maple.pts,Polygon,Maple_Band,35,35)

> image(Maple_kernel)

Step 3 Image: KDE Estimate. Shows where the maple trees are clumped together (darker = more trees)

Step 4 Code:

> Maple.nndist=nndistG(Maple.pts)

> mean(Maple.nndist$dists)

[1] 0.01794514

> summary(Maple.nndist$dists)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.001000 0.009447 0.015033 0.017945 0.022444 0.125929

> hist(Maple.nndist$dists)

Step 4 Chart: Histogram showing distribution of nearest neighbor distances

Step 5 Code:

> Maple_Ghat=Ghat(Maple.pts,seq(0,0.489,0.005))

> plot(Maple_Ghat,type="l")

Step 6 Chart: Ghat Distribution

A fast rise indicates that neighbors are found quickly in short distances, thus provides evidence of clustering

Step 6 Code:

> Maple.random=csr(Polygon,npts(Maple.pts))

> Maple_Fhat=Fhat(Maple.pts,Maple.random,seq(0,0.489,0.005))

> r <- seq(0, 0.489, 0.005)

> plot(r, Maple_Fhat,type="l")

> lambda <- npts(Maple.pts)/areapl(Polygon)

> lines(r,1-exp(-lambda*pi*r^2),lty = 2)

Step 6 Chart: Fhat Distribution

A slower rise than the dotted curve (representing CSR distribution) indicates that many large voids exist in the observed distribution, thus providing evidence of clustering

Results

After completing KDE, Ghat, and Fhat calculation, there is evidence of clustering of maple trees in the studied area of Lansing, Michigan. The biggest clusters of maple trees are located in the central part, the southeast corner, and the southwest corner of the study area.

See Previous Project

Return to Home

See Next Project

Page updated

Google Sites

Report abuse