Bayesian isolation-by-distance and isolation-by-population models

Post date: Jan 31, 2014 5:20:59 PM

I wanted to compare models where genetic distances are related to geographic distances, whether localities belong to the same population (taxon), or both. I used a Bayesian extension of the maximum likelihood population-effects model proposed by Clarke et al. 2002. This includes population specific residuals (random effects) to account for the non-independence of residuals for distances between population pairs that include the same population. I fit the model using rjags. Here is the model code:

model {

## normal likelihood and linear model: mu = beta0 + beta X + alpha_i + alpha_j

for (i in 1:n) {

Y[i] ~ dnorm(mu[i],taue)

mu[i]<-beta0 + inprod(X[i,],beta) + alpha[pop1[i]] + alpha[pop2[i]]

}

## covariate priors

for (k in 1:p) {

beta[k] ~ dnorm(0,0.001)

}

## intercept prior

beta0 ~ dnorm(0,0.001)

## population effect priors

for (j in 1:npop){

alpha[j] ~ dnorm(0, taua)

}

## residual priors

taua ~ dgamma(1,0.01)

taue ~ dgamma(1,0.01)

}

This code is in the file projects/lycaeides_admixture/spatial_autocorrelation/bayesMMR.R. I used Nei's D (1983) as the measure of genetic distances, which I then logit transformed and centered. I also centered and standardized the covariates (standardized geographic distance only). The R code for this analysis is below (from distAnalysis.R)

##load("autoc.rwsp")

## function Nei's D 1983 from Genetic Distances and Reconstruction of Phylogenetic Trees From Microsatellite DNA, Genetics 1996

neisD<-function(P){ ## rows = pops, cols = loci

N<-dim(P)[1]

L<-dim(P)[2]

D<-matrix(0,nrow=N,ncol=N)

for(i in 1:(N-1)){ for(j in (i+1):N){

D[i,j]<-1 - (sum(sqrt((1-P[i,])*(1-P[j,]))) + sum(sqrt(P[i,]*P[j,])))/L

D[j,i]<-D[i,j]

}}

D<-as.dist(D)

return(D)

}

## calculate nei's distance

neiDcommon<-neisD(t(afcommon))

neiDlowf<-neisD(t(aflowf))

neiDrare<-neisD(t(afrare))

## same taxa

taxa<-read.table("mod_taxa.txt",header=F)

dtaxa<-matrix(1,nrow=66,ncol=66)

for(i in 1:66){for(j in 1:66){

if(taxa[i,2]==taxa[j,2]){

dtaxa[i,j]<-0

}

}}

a<-as.dist(dtaxa)==1

## which taxa

wtaxa<-matrix(-9,nrow=66,ncol=66)

for(i in 1:66){for(j in 1:66){

if(taxa[i,2]==taxa[j,2]){

wtaxa[i,j]<-taxa[i,2]

}

}}

wa<-as.dist(wtaxa)

wd<-wa

wd[wa==-9]<-1

wd[wa!=-9]<-0

X1<-wd ## pop distance

## run bayesian analysis

da<-as.dist(dmatrix)

da<-da-mean(da)

da<-da/sd(da)

X2<-da ## geo distance

outCommonFull<-mmrMcmc(Y=as.vector(neiDcommon),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outCommonPop<-mmrMcmc(Y=as.vector(neiDcommon),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outCommonDist<-mmrMcmc(Y=as.vector(neiDcommon),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outLowfFull<-mmrMcmc(Y=as.vector(neiDlowf),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outLowfPop<-mmrMcmc(Y=as.vector(neiDlowf),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outLowfDist<-mmrMcmc(Y=as.vector(neiDlowf),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outRareFull<-mmrMcmc(Y=as.vector(neiDrare),X1=as.vector(X1),X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outRarePop<-mmrMcmc(Y=as.vector(neiDrare),X1=as.vector(X1),X2=NULL,npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

outRareDist<-mmrMcmc(Y=as.vector(neiDrare),X1=NULL,X2=as.vector(X2),npop=66,nmcmc=10000,burnin=2000,thin=5,nchain=3)

The full models were preferred based on DIC in all cases. Parameter estimates and such are below:

common variants, multivariate psrf 1.02 , ess > 5000

Mean SD Naive SE Time-series SE

beta[1] 1.094367 0.022163 0.0002861 0.0002818

beta[2] 0.255774 0.008828 0.0001140 0.0001221

beta0 0.003584 0.102505 0.0013233 0.0092727

taua 6.766642 1.172888 0.0151419 0.0164781

taue 7.870008 0.243082 0.0031382 0.0031031

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

beta[1] 1.0509 1.07923 1.094769 1.10934 1.1369

beta[2] 0.2384 0.24999 0.255813 0.26167 0.2733

beta0 -0.2008 -0.06377 0.001404 0.06956 0.2082

taua 4.6475 5.95133 6.693516 7.52982 9.2876

taue 7.3971 7.70656 7.864144 8.03920 8.3434

lowf variants, multivariate psrf 1.00, ess > 5000

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

beta[1] 0.509414 0.012574 1.623e-04 1.661e-04

beta[2] 0.166253 0.004858 6.272e-05 6.341e-05

beta0 -0.001894 0.049334 6.369e-04 4.032e-03

taua 27.012325 4.831462 6.237e-02 6.419e-02

taue 25.058815 0.771721 9.963e-03 9.978e-03

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

beta[1] 0.48482 0.50094 0.50936 0.51784 0.53390

beta[2] 0.15679 0.16297 0.16626 0.16960 0.17580

beta0 -0.09577 -0.03692 -0.00147 0.03209 0.09306

taua 18.43672 23.64708 26.70077 30.14503 37.17980

taue 23.56968 24.54444 25.05409 25.55650 26.61875

rare variants, multivariate psrf 1.11 = only beta0, ess > 5000

rare variants populatione effect less important

1. Empirical mean and standard deviation for each variable,

plus standard error of the mean:

Mean SD Naive SE Time-series SE

beta[1] 0.243612 0.005917 7.638e-05 7.639e-05

beta[2] 0.076496 0.002356 3.041e-05 3.042e-05

beta0 -0.002184 0.036988 4.775e-04 4.460e-03

taua 51.343479 8.900529 1.149e-01 1.182e-01

taue 111.723814 3.517744 4.541e-02 4.541e-02

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%

beta[1] 0.23210 0.23953 0.243599 0.24761 0.25524

beta[2] 0.07184 0.07492 0.076490 0.07809 0.08112

beta0 -0.07610 -0.02785 -0.001799 0.02530 0.06419

taua 35.35681 45.19479 50.856047 56.97082 70.09338

taue 104.81508 109.33988 111.689697 114.10852 118.80338