Post date: Jan 31, 2014 5:20:59 PM
I wanted to compare models where genetic distances are related to geographic distances, whether localities belong to the same population (taxon), or both. I used a Bayesian extension of the maximum likelihood population-effects model proposed by Clarke et al. 2002. This includes population specific residuals (random effects) to account for the non-independence of residuals for distances between population pairs that include the same population. I fit the model using rjags. Here is the model code:
model {
## normal likelihood and linear model: mu = beta0 + beta X + alpha_i + alpha_j
for (i in 1:n) {
Y[i] ~ dnorm(mu[i],taue)
mu[i]<-beta0 + inprod(X[i,],beta) + alpha[pop1[i]] + alpha[pop2[i]]
}
## covariate priors
for (k in 1:p) {
beta[k] ~ dnorm(0,0.001)
}
## intercept prior
beta0 ~ dnorm(0,0.001)
## population effect priors
for (j in 1:npop){
alpha[j] ~ dnorm(0, taua)
}
## residual priors
taua ~ dgamma(1,0.01)
taue ~ dgamma(1,0.01)
}
This code is in the file projects/lycaeides_admixture/spatial_autocorrelation/bayesMMR.R. I used Nei's D (1983) as the measure of genetic distances, which I then logit transformed and centered. I also centered and standardized the covariates (standardized geographic distance only). The R code for this analysis is below (from distAnalysis.R)
##load("autoc.rwsp")
## function Nei's D 1983 from Genetic Distances and Reconstruction of Phylogenetic Trees From Microsatellite DNA, Genetics 1996
neisD<-function(P){ ## rows = pops, cols = loci
N<-dim(P)[1]
L<-dim(P)[2]
D<-matrix(0,nrow=N,ncol=N)
for(i in 1:(N-1)){ for(j in (i+1):N){
D[i,j]<-1 - (sum(sqrt((1-P[i,])*(1-P[j,]))) + sum(sqrt(P[i,]*P[j,])))/L
D[j,i]<-D[i,j]
}}
D<-as.dist(D)
return(D)
}
## calculate nei's distance
neiDcommon<-neisD(t(afcommon))
neiDlowf<-neisD(t(aflowf))
neiDrare<-neisD(t(afrare))
## same taxa
taxa<-read.table("mod_taxa.txt",header=F)
dtaxa<-matrix(1,nrow=66,ncol=66)
for(i in 1:66){for(j in 1:66){
if(taxa[i,2]==taxa[j,2]){
dtaxa[i,j]<-0
}
}}
a<-as.dist(dtaxa)==1
## which taxa
wtaxa<-matrix(-9,nrow=66,ncol=66)
for(i in 1:66){for(j in 1:66){
if(taxa[i,2]==taxa[j,2]){
wtaxa[i,j]<-taxa[i,2]
}
}}
wa<-as.dist(wtaxa)
wd<-wa
wd[wa==-9]<-1
wd[wa!=-9]<-0
X1<-wd ## pop distance
## run bayesian analysis
da<-as.dist(dmatrix)
da<-da-mean(da)
da<-da/sd(da)
X2<-da ## geo distance
outCommonFull<-mmrMcmc(Y=as.vector(neiDcommon),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outCommonPop<-mmrMcmc(Y=as.vector(neiDcommon),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outCommonDist<-mmrMcmc(Y=as.vector(neiDcommon),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outLowfFull<-mmrMcmc(Y=as.vector(neiDlowf),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outLowfPop<-mmrMcmc(Y=as.vector(neiDlowf),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outLowfDist<-mmrMcmc(Y=as.vector(neiDlowf),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outRareFull<-mmrMcmc(Y=as.vector(neiDrare),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outRarePop<-mmrMcmc(Y=as.vector(neiDrare),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
outRareDist<-mmrMcmc(Y=as.vector(neiDrare),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)
The full models were preferred based on DIC in all cases. Parameter estimates and such are below:
common variants, multivariate psrf 1.02 , ess > 5000
Mean SD Naive SE Time-series SE
beta[1] 1.094367 0.022163 0.0002861 0.0002818
beta[2] 0.255774 0.008828 0.0001140 0.0001221
beta0 0.003584 0.102505 0.0013233 0.0092727
taua 6.766642 1.172888 0.0151419 0.0164781
taue 7.870008 0.243082 0.0031382 0.0031031
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
beta[1] 1.0509 1.07923 1.094769 1.10934 1.1369
beta[2] 0.2384 0.24999 0.255813 0.26167 0.2733
beta0 -0.2008 -0.06377 0.001404 0.06956 0.2082
taua 4.6475 5.95133 6.693516 7.52982 9.2876
taue 7.3971 7.70656 7.864144 8.03920 8.3434
lowf variants, multivariate psrf 1.00, ess > 5000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
beta[1] 0.509414 0.012574 1.623e-04 1.661e-04
beta[2] 0.166253 0.004858 6.272e-05 6.341e-05
beta0 -0.001894 0.049334 6.369e-04 4.032e-03
taua 27.012325 4.831462 6.237e-02 6.419e-02
taue 25.058815 0.771721 9.963e-03 9.978e-03
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
beta[1] 0.48482 0.50094 0.50936 0.51784 0.53390
beta[2] 0.15679 0.16297 0.16626 0.16960 0.17580
beta0 -0.09577 -0.03692 -0.00147 0.03209 0.09306
taua 18.43672 23.64708 26.70077 30.14503 37.17980
taue 23.56968 24.54444 25.05409 25.55650 26.61875
rare variants, multivariate psrf 1.11 = only beta0, ess > 5000
rare variants populatione effect less important
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
beta[1] 0.243612 0.005917 7.638e-05 7.639e-05
beta[2] 0.076496 0.002356 3.041e-05 3.042e-05
beta0 -0.002184 0.036988 4.775e-04 4.460e-03
taua 51.343479 8.900529 1.149e-01 1.182e-01
taue 111.723814 3.517744 4.541e-02 4.541e-02
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
beta[1] 0.23210 0.23953 0.243599 0.24761 0.25524
beta[2] 0.07184 0.07492 0.076490 0.07809 0.08112
beta0 -0.07610 -0.02785 -0.001799 0.02530 0.06419
taua 35.35681 45.19479 50.856047 56.97082 70.09338
taue 104.81508 109.33988 111.689697 114.10852 118.80338