High genetic differentiation regions and allele frequency change in experimental populations

Post date: Nov 18, 2013 9:43:20 PM

We wanted to know whether SNPs in high genetic differentiation regions between populations on different host plants are more likely to diverge in experimental populations transplanted to different host plants. We could not test this at the level of individuals SNPs in both data sets because few SNPs were identified in the whole genome re-sequence data and the between-generation experiment data.

We first classified between generation SNPs as being or not being in high genetic differentiation regions based on the 3-state HMM results. A SNP was assigned to a high genetic differentiation region if it was flanked by high genetic differentiation SNPs on the same scaffold for a population pair (or was itself a high genetic differentiation SNP). We classified non-high genetic differentiation SNPs similarly, but coded SNPs as 'NA' if they were flanked by differently classified SNPs or did not have flanking SNPs. This was accomplished with the perl script bgsrStates.pl in projects/timema_wgwild/experiment.

We next quantified the average difference in allele frequency change between experimental populations transplanted to Adenostoma and Ceanothus. These results are from the between generation experiment with allele frequency change estimated using bgsr ('dp' parameter) with the null model (no selection).

We then grabbed the set of SNPs with the highest (99.9th or 99.5th quantile) average difference in allele frequency change (hereafter 'parallel change' SNPs). We asked for each population how many of these parallel change SNPs were in high genetic differentiation regions, and used a permutation test (1000 permutations) to determine whether a greater number of parallel SNPs were in high genetic differentiation regions than expected by chance (i.e., if these were a random sample of SNPs). The results are below,

Number of 99.9th quantile parallel change SNPs in high genetic differentiation regions

Number of 99.5th quantile parallel change SNPs in high genetic differentiation regions

Thus, 99.5th quantile parallel change SNPs are more likely to be in high divergence regions between R12A and R12C than expected by chance, but this is not true for other SNPs or regions. This result indicates that there is at least partial and weak concordance between genetic differentiation and experimental evolution.

We next asked the related question of whether parallel change SNPs showed significant parallal genetic differntiation. We did this by first calculating the number of parallel change SNPs in high genetic differentiation regions in 0, 1, 2, 3, or all 4 population pairs and then asking whether this number is greater than expected by chance (via 1000 permutations). Here are the results,

99.9th quantile parallel change SNPs

9959th quantile parallel change SNPs

In both cases we have marginally significant evidence that the parallel change SNPs are more likely to occur in high differentiation regions in two populations than expected by chance. Note, if you drop down to the 99.0th quantile the signal for two population pairs goes away, but there is a marginally significant enrichment for three population pairs.

Overall I think these results show (as so often seems to be the case) that there is some correspondence between the parallel divergence in the experiment and nature, but that it is not super strong.

Here is the R code (also in enrichmentCommands.R) I used for this analysis,

## analysis to detemine whether high allele frequency change snps are in high differentiation regions

load("exper.rwsp") ## this contains the hmm results, the rest of the objects come from code below

nl<-82060

nlwg<-4391556

npp<-4

## allele frequency divergence

afDiv<-scan("bgsrAlleleFreqDiv.txt",n=nl)

## read bgsr locus ids

bgsrIds<-matrix(scan("bgsr_locids.txt",n=nl*2,sep=" "),nrow=nl,ncol=2,byrow=TRUE)

## read wgrs locus ids, lg snps only

wgrsIds<-matrix(scan("wgrs_locids.txt",n=nlwg*4,sep=" "),nrow=nlwg,ncol=4,byrow=TRUE)

## read hmm states for loci, used perl script to get these

hmmHVAxHVC<-matrix(scan("bgsrStates_HVAxHVC.txt",n=nl*3,sep=" "),nrow=nl,ncol=3,byrow=TRUE)

hmmMR1AxMR1C<-matrix(scan("bgsrStates_MRAxMRC.txt",n=nl*3,sep=" "),nrow=nl,ncol=3,byrow=TRUE)

hmmR12AxR12C<-matrix(scan("bgsrStates_R12AxR12C.txt",n=nl*3,sep=" "),nrow=nl,ncol=3,byrow=TRUE)

hmmLAxPRC<-matrix(scan("bgsrStates_LAxPRC.txt",n=nl*3,sep=" "),nrow=nl,ncol=3,byrow=TRUE)

hmmStates<-as.matrix(cbind(hmmHVAxHVC[,3],hmmMR1AxMR1C[,3],hmmR12AxR12C[,3],hmmLAxPRC[,3]))

## determine whether bgsr SNPs are in high divergence regions

q<-quantile(abs(afDiv),probs=c(0.995)) ## this yields 83 loci >= q

X<-which(abs(afDiv) >= q)

## observed

obs<-numeric(npp)

for(pp in 1:npp){

obs[pp]<-sum(hmmStates[X,pp] == 1,na.rm=TRUE)

}

## permutations for null

nsam<-1000

Nx<-length(X)

null<-matrix(NA,nrow=nsam,ncol=npp)

for(pp in 1:npp){

for(i in 1:nsam){

Y<-sample(1:nl,Nx,replace=FALSE)

null[i,pp]<-sum(hmmStates[Y,pp] == 1,na.rm=TRUE)

}

## combined evidence across all four population pairs

cnt<-apply(hmmStates[X,]==1,1,sum,na.rm=TRUE)

obsCnts<-numeric(5)

for(j in 1:5){

obsCnts[j]<-sum(cnt==(j-1))

}

nsam<-1000

Nx<-length(X)

nullCnts<-matrix(NA,nrow=nsam,ncol=5)

for(i in 1:nsam){

Y<-sample(1:nl,Nx,replace=FALSE)

cnt<-apply(hmmStates[Y,] == 1,1,sum,na.rm=TRUE)

for(j in 1:5){

nullCnts[i,j]<-sum(cnt==(j-1))

}