Post date: Jun 30, 2016 8:19:3 PM
Summary: Most of the genome was assigned to one of the 12 autosomes or the X chromosome. Here are the numbers in base pairs:
scaf bps
NA 76360954
1 136309881
10 69904984
11 54716912
12 16118945
13 77400025
2 115664566
3 63572792
4 57895619
5 56735921
6 54836842
7 54047697
8 52283717
9 54727140
total length of LGs = 864215041
total length of all scaffolds = 940575995
This is good (92%). But, the ordering of markers suggests something odd. In particular, LGs tend to be on the order of 1000 cMs (10x bigger than I would expect) and error rates are very high. I want to see how much this changes if I only include the SNPs initially added to LGs (rather than all those on the scaffold).
I made a new version of assignScafFilter.pl (assignScafFilter2.pl) that still drops scaffolds entirely based on conditions (below) but does not add LG info to SNPs without it. I ran this then re-ran scaffold ordering.
perl assignScafFilter2.pl tcrLgs.txt
##########################
## decide on LG for scaf
## rules:
## $nsnp SNPs supporting LG with $prct of SNPs in agreement
## $min SNPs total
## $alt is maximum % for an alternative LG
$prct = 0.1;
$nsnp = 2;
$min = $nsnp;
$alt = 0.5;
##########################
tail -n +2 mod2_tcrLgs.txt | cut -f 1 -d " " > mod2_tcrLgsSnp.txt
## ALL
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/timema_mappingfams/mapdata/lm2map
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=12 numThreads=4 initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order_LG12tcr.txt
## famA brown
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/timema_mappingfams/mapdata/lm2map
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=12 numThreads=4 informativeMask=2 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGbrwn12tcr.txt
## famA green
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/timema_mappingfams/mapdata/lm2map
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=12 numThreads=4 informativeMask=1 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGgrn12tcr.txt
LGs were compared between the old and new genomes/linkage maps (attached plot based on the code below). Comparison with the old genome and LGs looks damn good (at least to me). For each new LG (Dovetail ver. 2 and new LGs) I calculated the amount of sequence that aligned to DNA sequence from each old LG (original genome as things stood for the paper in review). The attached figure has one panel per new LG and shows that amount of sequence aligned to each old LG. Note that old LG 0 means that it wasn't on an LG. Note that each new LG essentially corresponds to an old LG and just pulls in extra bits that were not previously on LGs (there are a few very minor peaks on the plots suggesting a bit of moving around, but these are really minor). The only exceptions is that we mostly lose the old LGs 5 and 12 (both get distributed a bit among other LGs), and two new LGs come mostly from stuff that wasn't on LGs before, this includes the sex chromosome LG 13, which gets a bit from various old LGs (and a mixture of these) but mostly comes from things not previously on LGs. So, it looks like things mostly agree and thus were not that bad before, and that things are likely quite good now (at least at the LG level, I suspect the ordering on LGs is pretty good too, though some scaffolds are certainly a bit out of order).
X<-read.table("alnSizeTable.txt",header=F)
X<-X[,-1]
colnames(X)<-0:13
rownames(X)<-0:13
pdf("lgComp.pdf",width=10,height=10)
par(mfrow=c(3,3))
for(i in 1:13){
plot(0:13,X[,i+1],type='l',xlab="old LG #",ylab="aligned seq. length")
title(main=paste("new LG",i))
}
dev.off()
I then obtained median cM positions for scaffolds (based on all families and parents or just family A and each parent) and cross-classified LGs by comparing old and new numbers (and using the old ones mostly; see below).
perl calcSexOrder.pl combinedSnpList.txt order*
perl reorderLgs.pl lookuptable.txt tcrLinkageMap.txt
Finally, I added the LG information to the genome and moved a copy to diogenes.
perl addLgInfo.pl ordered_tcrLinkageMap.txt timema_06Jun2016_RvNkF.fasta
/stash/timema/zgompert/map_timema_06Jun2016_RvNkF.fasta