Post date: Sep 25, 2016 3:54:23 AM
We determined that scaffold 702 was likely over-assembled based on LG assignments (large scaffold with half of SNPs assigned to one LG and the other half to another) and GWA patterns (shift in signal of association with stripe/color in the same region where LG assignment shifts). I split it into three chunks, one on LG8 and one on LG3 (and one small chunk on LG NA).
In /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/ I ran,
perl splitSc702.pl timema_06Jun2016_RvNkF702.fasta
to produce timema_06Jun2016_RvNkF702.fasta.
I then re-ordered scaffolds 8 and 3 (note the 'X's in the file names were just temporary, I dropped them after verifying that all was well).
perl assignScafFilter2.pl tcrLgs.txt
tail -n +2 mod2_tcrLgs.txt | cut -f 1 -d " " > mod2_tcrLgsSnp.txt
#SBATCH --time=72:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert
#SBATCH --partition=kingspeak
#SBATCH --job-name=l2map
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $SLURM_JOB_NODELIST
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load gcc
module load gsl
module load hdf5
module load bwa
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/timema_mappingfams/mapdata/lm2map
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=4 numThreads=4 informativeMask=2 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGbrwn4tcrX.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=4 numThreads=4 informativeMask=1 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGgrn4tcrX.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=2 numThreads=4 informativeMask=1 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGgrn2tcrX.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=2 numThreads=4 informativeMask=2 families=famA initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order2_LGbrwn2tcrX.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=mod2_tcrLgsSnp.txt data=data_tcristlm.txt minError=0.01 chromosome=2 numThreads=4 initRecombination=0.05 0.05 learnRecombinationParameters=1 1 > order_LG2tcrX.txt
I then obtained median cM positions for scaffolds (based on all families and parents or just family A and each parent) and cross-classified LGs by comparing old and new numbers (and using the old ones mostly; see below).
perl calcSexOrder.pl combinedSnpList.txt order*
perl reorderLgs.pl lookuptable.txt tcrLinkageMap.txt
sort -g mod_tcrLinkageMap.txt >
ordered_tcrLinkageMap.txt
Finally, I added the LG information to the genome (tcrDovetail/version3/)and moved a copy to diogenes.
perl addLgInfo.pl ordered_tcrLinkageMap.txt timema_06Jun2016_RvNkF702.fasta
/stash/timema/zgompert/map_timema_06Jun2016_RvNkF702.fasta
A few thoughts:
I checked for consistency in scaffold position between the male and female parents in family A. I did this by calculating the absolute value of the pearson correlation coefficients between the positions. By taking the absolute values I am accounting for the fact that there isn't really a right or left end. However, this could actually be screwy for LG8, where there are major inversions (see additional notes on this below).
The correlations in cM position between the two parents range from 0.18 to 0.94, with an average of 0.58:
chr correlation
[1,] 1 0.5202054
[2,] 2 0.8498863
[3,] 3 0.8088848
[4,] 4 0.8687553
[5,] 5 0.1873797
[6,] 6 0.1965959
[7,] 7 0.8574650
[8,] 8 0.3976039
[9,] 9 0.3471758
[10,] 10 0.3740990
[11,] 11 0.9435215
[12,] 12 0.2625546
[13,] 13 0.9411871
So, really not that horrible, and the average probably is mostly the best choice. It is just annoying that it might be misleading for LG8, where scaffolds 702.1 and 128 are adjacent for the mom and dad of family 8, but not the combined. With that said, the raw correlation for LG8 is negative, but these two scaffolds are in the same order (so, they could actually be reversed between parents).