Post date: Jun 16, 2014 7:11:32 PM
There was a major error with the within generation whole genome sequence alignments. In particular, none of the linkage groups scaffolds were included. I think this happened because they also were not in one of the genome index files (*.ann), perhaps because I used the is algorithm instead of bwts to index the genome. I have re-indexed the genome and I am re-running the alignments.
Here is the command for plate 0 (to be repeated for plates 1-4):
perl scripts/wrap_qsub_rc_runbwa.pl plate0/*1.fastq.gz
and here is one example of what it does:
bwa mem -t 20 -k 20 -w 100 -r 1.3 -T 30 -R \'@RG\\tID:timema0_296\' /home/A01963476/data/timema/draft_genome/draft0.3/mod_lg_timemaGenome.fasta /home/A01963476/data/timema/timema_wgrs/plate0/WTCHG_53469_296_1.fastq.gz /home/A01963476/data/timema/timema_wgrs/plate0/WTCHG_53469_296_2.fastq.gz > /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/aln_0_296_53469.sam 2> /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/error_0_296_53469.log
The alignments now have LG scaffolds!
Now I am compressing, sorting and indexing the alignments, e.g.:
cd /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/
samtools view -b -S -o aln_4_295_64550.bam aln_4_295_64550.sam
samtools sort aln_4_295_64550.bam aln_4_295_64550.sorted
samtools index aln_4_295_64550.sorted.bam
Now I am merging the set of alignments for each individual. The information for this is in a pair of indIds*txt files (1 and 2), here is an example:
cd /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/
samtools merge -nr timemaTC_3C_24863.bam aln_0_296*sorted.bam
samtools sort timemaTC_3C_24863.bam timemaTC_3C_24863.sorted
samtools index timemaTC_3C_24863.sorted.bam