R Code for Differential Smoothing

This page walks through the R code corresponding to the illustrative example in Sections 2 and 4 (Last updated 09/16/2016; version 1.0) of Quick, H., Holan, S.H., and Wikle, C.K. “Generating partially synthetic geocoded public use data with decreased disclosure risk using differential smoothing.” Under revision at the Journal of the Royal Statistical Society, Series A.

The .zip file contains the following files:

    • code.R: This is the main file you will run

    • run_update_full_cont_A.R: This file contains the code for updating the MCMC algorithm for the differential smoothing approach. Note that as of now, this code will *not* update the spatial decay parameter, phi.

    • data.rdata: The dataset from the illustrative example. These data were generated by Dr. Quick and are not based on any real dataset.

    • Output: An (initially) empty folder where the MCMC output will be saved. Probably not necessary for this example, but this is the structure I use for all of my code (for better or worse).

The beginning of the code.R file sets up the various pieces necessary to run the MCMC algorithm, including specifying the prior distributions. The key aspect of differential smoothing begins with the chunk of code labeled "Important code":

mdel=rep(NA,N) #find the distance to the closest neighbor ("min distance")

n0=1 #you could modify this to find the average distance to

#the nearest n0 neighbors...

for(i in 1:N){

dist=sqrt( (s[i,1]-s[-i,1])^2 + (s[i,2]-s[-i,2])^2 )

mdel[i]=mean( dist[order(dist)][1:n0])

}

alpha=0.20 #outlier defn from paper

#alpha=0 #No outliers

#alpha=0.5 #a more conservative defn of outliers (i.e., results in more outliers)

M=-log(alpha)/phi0 #distance at which an observation is a "spatial outlier"

Ain=rep(1,N); Ain[mdel>M]=0

We will detail the results when alpha=0.20, and then we will present results using other values of alpha.

We first display trace plots for various model parameters.

This figure resembles the one shown in Figure 2a from the manuscript, where the model does not appear to be adequately protecting the spatial outliers, as evidenced by the "ring" around the oultying observation at location (0.51, 0.01).

As first seen in Figure 2b from the manuscript, the differential smoothing approach works to protect the spatial outliers in the data.

We now re-run the code with alpha = 0.

Here we see that aside from beta_0, there are no apparent convergence issues. Regarding beta_0, there is a identifiability issue (as is often the case in spatial analyses). Essentially, unless we enforce a sum-to-zero constraint on our spatial random effects, w(s), the mean of the random effects and beta_0 will fight to measure the same thing. Fortunately, however, beta_0 + w(s_i) is identifiable for all i, thus, this issue of identifiability is not of concern. We also see how the random effect corresponding to the spatial outlier is clearly not being learned about in the same way as the random effects for non-outliers, as it appears to be centered around 0 with a variance close to 4 (i.e., the value of sig2).

We now look at the prediction plot resulting from alpha = 0.20 as compared to the true response surface.