aspen vcf to genotypes

Post date: Nov 30, 2015 6:38:15 PM

Extracted genotype likelihoods:

We calculated the average ML allele frequency estimates in R to use as priors on genotypes. Note that after filtering, all SNPs had similar allele frequencies in diploids and triploids.

Converted genotype likelihoods to genotype point estimates. We only kept genotypes when the posterior probability of one of the genotypes was > 0.9. If this was not the case, we replaced the genotype estimate with NA.

perl ../Scripts/gl2genestDip.pl sub_filtered2x_combinedDip.gl af_DipTripAverage.txt

5726415 high prob. genotypes; 1431570 genotypes converted to NA

perl ../Scripts/gl2genestTrip.pl sub_filtered2x_combinedTrip.gl af_DipTripAverage.txt

2510161 high prob. genotypes; 910469 genotypes converted to NA

PCA:

## PCA on aspen, dips vs. trips and geography, etc.

nloci<-63345

ndip<-113

ntrip<-54

ntotal<-ndip+ntrip

dip<-matrix(scan("pntest_sub_filtered2x_combinedDip.txt",n=nloci*ndip,sep=" "),nrow=nloci,ncol=ndip,byrow=TRUE)

trip<-matrix(scan("pntest_sub_filtered2x_combinedTrip.txt",n=nloci*ntrip,sep=" "),nrow=nloci,ncol=ntrip,byrow=TRUE)

dip<-dip/2

trip<-trip/3

dmat<-cbind(dip,trip)

## distance matrix

distout<-as.matrix(dist(t(dmat),diag=TRUE,upper=TRUE))

## covariance matrix

covmat<-matrix(NA,nrow=ntotal,ncol=ntotal)

for(i in 1:ntotal){

for(j in i:ntotal){

a<-which(is.na(dmat[,i])==FALSE & is.na(dmat[,j])==FALSE)

covmat[i,j]<-cov(dmat[a,i],dmat[a,j])

covmat[j,i]<-covmat[i,j]

}

## pca

pcaout<-prcomp(distout,center=TRUE,scale=FALSE)

pcaout<-prcomp(covmat,center=TRUE,scale=FALSE)

plot(pcaout$x[,1],pcaout$x[,2],type='n')

points(pcaout$x[,1],pcaout$x[,2],col=c(rep("blue",ndip),rep("orange",ntrip)))

Page updated

Google Sites

Report abuse