Post date: Nov 30, 2015 6:38:15 PM
Extracted genotype likelihoods:
We calculated the average ML allele frequency estimates in R to use as priors on genotypes. Note that after filtering, all SNPs had similar allele frequencies in diploids and triploids.
Converted genotype likelihoods to genotype point estimates. We only kept genotypes when the posterior probability of one of the genotypes was > 0.9. If this was not the case, we replaced the genotype estimate with NA.
perl ../Scripts/gl2genestDip.pl sub_filtered2x_combinedDip.gl af_DipTripAverage.txt
5726415 high prob. genotypes; 1431570 genotypes converted to NA
perl ../Scripts/gl2genestTrip.pl sub_filtered2x_combinedTrip.gl af_DipTripAverage.txt
2510161 high prob. genotypes; 910469 genotypes converted to NA
PCA:
## PCA on aspen, dips vs. trips and geography, etc.
nloci<-63345
ndip<-113
ntrip<-54
ntotal<-ndip+ntrip
dip<-matrix(scan("pntest_sub_filtered2x_combinedDip.txt",n=nloci*ndip,sep=" "),nrow=nloci,ncol=ndip,byrow=TRUE)
trip<-matrix(scan("pntest_sub_filtered2x_combinedTrip.txt",n=nloci*ntrip,sep=" "),nrow=nloci,ncol=ntrip,byrow=TRUE)
dip<-dip/2
trip<-trip/3
dmat<-cbind(dip,trip)
## distance matrix
distout<-as.matrix(dist(t(dmat),diag=TRUE,upper=TRUE))
## covariance matrix
covmat<-matrix(NA,nrow=ntotal,ncol=ntotal)
for(i in 1:ntotal){
for(j in i:ntotal){
a<-which(is.na(dmat[,i])==FALSE & is.na(dmat[,j])==FALSE)
covmat[i,j]<-cov(dmat[a,i],dmat[a,j])
covmat[j,i]<-covmat[i,j]
}
}
## pca
pcaout<-prcomp(distout,center=TRUE,scale=FALSE)
pcaout<-prcomp(covmat,center=TRUE,scale=FALSE)
plot(pcaout$x[,1],pcaout$x[,2],type='n')
points(pcaout$x[,1],pcaout$x[,2],col=c(rep("blue",ndip),rep("orange",ntrip)))