Post date: Sep 13, 2013 3:35:24 PM
I am mapping the whole genome resequence data for all of the individuals from plates 5 and 6 to the Timema draft reference genome (version 0.2). These are paired-end reads and each individual's data is split over four lanes of sequencing (96 individuals per lane). I am using the bwa mem algorithm (bwa version 0.7.5a-r405) for the read alignments. With 20 bp minimum seeds (-k 20), tuning parameter -r 1.3 (lower values increase accuracy but take longer, the default is 1.5), and a minimum score for aligned reads of 30 (-T 30, where this is the phred scaled probability that the read is not correctly mapped). Other values are defaults (see the default options below). I am running the alignments in groups of 96 with 20 threads (job numbers 11670-11677).
BWA options and default values,
Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]
Algorithm options:
-t INT number of threads [1]
-k INT minimum seed length [19]
-w INT band width for banded alignment [100]
-d INT off-diagonal X-dropoff [100]
-r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]
-c INT skip seeds with more than INT occurrences [10000]
-S skip mate rescue
-P skip pairing; mate rescue performed unless -S also in use
-A INT score for a sequence match [1]
-B INT penalty for a mismatch [4]
-O INT gap open penalty [6]
-E INT gap extension penalty; a gap of size k cost {-O} + {-E}*k [1]
-L INT penalty for clipping [5]
-U INT penalty for an unpaired read pair [17]
Input/output options:
-p first query file consists of interleaved paired-end sequences
-R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
-v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
-T INT minimum score to output [30]
-a output all alignments for SE or unpaired PE
-C append FASTA/FASTQ comment to SAM output
-M mark shorter split hits as secondary (for Picard/GATK compatibility)