Post date: Nov 21, 2018 12:21:36 PM
R/qtl had trouble working with so many markers and had issues with segregation distortion. Moved to lepmap2 for increased ability to work with many markers.
1. Formatting for lepmap2
## scaffold, position, parent alleles 1, parent alleles 2, # offspring, genotypes for offspring (pairs; 1,2)
loci<-read.table('locusids2.txt', header=FALSE, sep=':')
## Parent 1 == LL Male
## Parent 2 == LM Female
## 0 = LL, 1 = LM/ML, 2 = MM
genos<-matrix(NA,nrow=34284,ncol=5+241)
genos[,1]<-loci[,1];genos[,2]<-loci[,2]
genos[,3]<-1
genos[,4]<-2
genos[,5]<-241
ch1<-as.matrix(read.table("L1_struc_out_popinfo1.txt_ss", sep=' ',nrow=241*34284,
blank.lines.skip=TRUE,colClasses=c(rep("numeric",6),"NULL")))
ch2<-as.matrix(read.table("L1_struc_out_popinfo2.txt_ss", sep=' ',nrow=241*34284,
blank.lines.skip=TRUE,colClasses=c(rep("numeric",6),"NULL")))
ch3<-as.matrix(read.table("L1_struc_out_popinfo3.txt_ss", sep=' ',nrow=241*34284,
blank.lines.skip=TRUE,colClasses=c(rep("numeric",6),"NULL")))
ch4<-as.matrix(read.table("L1_struc_out_popinfo4.txt_ss", sep=' ',nrow=241*34284,
blank.lines.skip=TRUE,colClasses=c(rep("numeric",6),"NULL")))
ch5<-as.matrix(read.table("L1_struc_out_popinfo5.txt_ss", sep=' ',nrow=241*34284,
blank.lines.skip=TRUE,colClasses=c(rep("numeric",6),"NULL")))
ch1<-as.matrix(transform(ch1, het=ch1[,4]+ch1[,5])[-c(4:5)])
ch2<-as.matrix(transform(ch2, het=ch2[,4]+ch2[,5])[-c(4:5)])
ch3<-as.matrix(transform(ch3, het=ch3[,4]+ch3[,5])[-c(4:5)])
ch4<-as.matrix(transform(ch4, het=ch4[,4]+ch4[,5])[-c(4:5)])
ch5<-as.matrix(transform(ch5, het=ch5[,4]+ch5[,5])[-c(4:5)])
#delete BB's and renormalize data
norms<-list(ch1,ch2,ch3,ch4,ch5)
for(i in 1:5){
mat<-norms[[i]]
mat[,4]<-mat[,3]+mat[,5]
mat[,3]<-mat[,3]/mat[,4]
mat[,5]<-mat[,5]/mat[,4]
mat<-mat[,-4]
mat[is.nan(mat)]<-0
norms[[i]]<-mat
}
ch1<-matrix(unlist(norms[[1]],use.names=FALSE),ncol=4)
ch2<-matrix(unlist(norms[[2]],use.names=FALSE),ncol=4)
ch3<-matrix(unlist(norms[[3]],use.names=FALSE),ncol=4)
ch4<-matrix(unlist(norms[[4]],use.names=FALSE),ncol=4)
ch5<-matrix(unlist(norms[[5]],use.names=FALSE),ncol=4)
avg<-matrix(NA,nrow=8262444,ncol=4)
avg[,1:2]<-ch1[,1:2]
avg[,3]<-apply(cbind(ch1[,3],ch2[,3],ch3[,3],ch4[,3],ch5[,3]),1,mean)
avg[,4]<-apply(cbind(ch1[,4],ch2[,4],ch3[,4],ch4[,4],ch5[,4]),1,mean)
avg<-avg[,4]
for(i in 1:241){
#for individual 1, get all loci, then move to ind 2
locus<-seq(0+i,8262444,241)
for(j in 1:34284){
genos[j,i+5]<-avg[locus[j]]
}
}
write.table(genos,'cmacL1_offspringGenotypes_fam1.txt', quote=FALSE,col.names=FALSE,row.names=FALSE)
2. Generate linkage format
perl makeLinkageFormat.pl cmacL1_offspringGenotypes_fam1.txt
# or perl makeMultiFamLinkage.pl cmacL1_offspringGenotypes_fam1.txt
3. Check number of each genotype
lm_fam<-read.table('lm_fam1.txt', header=FALSE, sep='\t')
fam<-lm_fam[-c(1,2),-c(1:6)]
M<-apply(fam,1,function(x) table(x))
apply(M,1,function(x) mean(x))
At 0.9/0.1 (the original) I'm getting ~29k 00, 2.1k 11, and 2.7k 12. Even cranking up missing data tolerance to 70%+, I'm only getting a few markers out the other end, and linkage grouping doesn't work.
At 0.85/0.15 I'm getting ~28k 00, 2.9k 11, and 3k 12. Same issue as above.
At 0.80/0.2 I'm getting ~27k 00, 3.9k 11, and 3.4k 12. Starting to get markers here, but they all map to one large LG. I fiddled around with the LOD limit, but it either maps most markers to one LG or all are singlets. This is the same issue I was having with R/qtl.
At 0.75/0.25 I'm getting ~25k 00, 5.4k 11, and 3.8k 12. Now I'm kind of getting the right number of LGs here (3-14). But there are either too few LGs (3, when we expect ~10), or too few markers per LG (<5 markers/LG).
.7/.3: 22k 00, 8.1k 11, 4.2k 12
0.65/0.35: 16898.461 12698.469 4687.071
It seems like there are more heterozygous (L/M) estimates at high confidence, but more homozygous L/L otherwise.
4. Filter markers.
java -cp ~/../u6000989/source/lepmap2/bin/ Filtering data=lm_fam1.txt epsilon=0.05 dataTolerance=0.01 missingLimit=130 keepAlleles=1 > data_fam1.txt
I only retain markers at ~50% missingLimit when I use 0.75/0.25 or lower confidence previously. Currently using 0.7/0.3
5. Create LGs
java -cp ~/../u6000989/source/lepmap2/bin/ SeparateChromosomes data=data_fam1.txt lodLimit=7 femalePrior=0.05 informativeMask=2 sizeLimit=10 > femLgs_cmaclm.txt
Okay results. Looking for ~10 LG. Trying lepmap3
4.2
java -cp ~/source/lepmap3/bin/ Filtering2 data=lm_fam1.txt dataTolerance=0.01 missingLimit=130 > data_fam1.txt
4.3
java -cp ~/source/lepmap3/bin/ SeparateChromosomes data=data_fam1.txt lodLimit=7 femalePrior=0.05 informativeMask=2 sizeLimit=10 > femLgs_cmaclm.txt