Post date: Nov 26, 2013 6:6:53 PM
I tried again to identify the T. cristinae sex chromosome in the genome assembly. This time I used only high quality genotypes and loci. Specifically, I only called genotypes with genotype quality >= 20 (99% confidence). I then only used loci with maf > 5% and genotype calls for at least 50% of individuals. I also only considered linkage group snps. This data set includes 191,449 snps. I then calculated the mean heterozygostiy for males and females by scaffold and linkage group. These are highly correlated between the sexes and I did not find scaffolds or linkage groups where males had lower and near-zero heterozygosity relative to females. The plots and results are in projects/timema_wgwild/variant and the R code is below. We are waiting on more data and perhaps a better assembly to pursue this issue further.
## search for sex lg or scaffold
nl<-1816965
ni<-160
G<-matrix(scan("hqgen.txt",n=nl*ni,sep=" "),nrow=nl,ncol=ni,byrow=TRUE)
L<-matrix(scan("hqids.txt",n=nl*4,sep=" "),nrow=nl,ncol=4,byrow=TRUE)
sex<-read.table("sexIds.txt",header=F)
## find loci with sufficient data
pctNA<-apply(is.na(G),1,mean)
X<-which(pctNA < 0.5) ## less than 50% missing data
Gx<-G[X,]
Lx<-L[X,]
## males vs. femals
m<-which(sex[,2]=='m')
f<-which(sex[,2]=='f')
GxHetM<-apply(Gx[,m]==1,1,mean,na.rm=TRUE)
GxHetF<-apply(Gx[,f]==1,1,mean,na.rm=TRUE)
NGxHetM<-apply(is.na(Gx[,m])==FALSE,1,mean)
NGxHetF<-apply(is.na(Gx[,f])==FALSE,1,mean)
ScafGxHetM<-tapply(X=GxHetM,INDEX=Lx[,3],mean)
ScafGxHetF<-tapply(X=GxHetF,INDEX=Lx[,3],mean)
LgGxHetM<-tapply(X=GxHetM,INDEX=Lx[,1],mean)
LgGxHetF<-tapply(X=GxHetF,INDEX=Lx[,1],mean)
## plots
pdf("hetMvsF.pdf",height=12,width=6)
par(mfrow=c(2,1))
par(pty='s')
plot(LgGxHetM,LgGxHetF,xlab="male mean heterozygosity",ylab="female mean heterozygosity",cex.lab=1.4,cex.axis=1.1,main="(a) linkage group heterozygosity")
abline(a=0,b=1,col="orange",lwd=2)
plot(ScafGxHetM,ScafGxHetF,xlab="male mean heterozygosity",ylab="female mean heterozygosity",cex.lab=1.4,cex.axis=1.1,main="(b) scaffold heterozygosity")
abline(a=0,b=1,col="orange",lwd=2)
dev.off()