Post date: Jun 24, 2014 11:2:18 PM
GATK uses more of the information in the read group and thus requires more stringency with these headers. This is particularly important for recalibrating base qualities. It will be really difficult to add these late in the game (after I merged the different runs for each individual), so I am going back to alignment. I modified the perl submission script to add more complete headers that included sample names and lane x sample specific read group ids (more information on headers for GATK can be found here). Here is an example of the new command:
bwa mem -t 20 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:66562_296\tPL:ILLUMINA\tLB:66562_296\tSM:timemaTC_2C_24787' /home/A01963476/data/timema/draft_genome/draft0.3/mod_lg_timemaGenome.fasta /home/A01963476/data/timema/timema_wgrs/plate4/WTCHG_66562_296_1.fastq.gz /home/A01963476/data/timema/timema_wgrs/plate4/WTCHG_66562_296_2.fastq.gz > /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/aln_4_296_66562.sam 2> /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/error_4_296_66562.log