reference assemblies, plates 5 and 6

Post date: Sep 13, 2013 3:35:24 PM

I am mapping the whole genome resequence data for all of the individuals from plates 5 and 6 to the Timema draft reference genome (version 0.2). These are paired-end reads and each individual's data is split over four lanes of sequencing (96 individuals per lane). I am using the bwa mem algorithm (bwa version 0.7.5a-r405) for the read alignments. With 20 bp minimum seeds (-k 20), tuning parameter -r 1.3 (lower values increase accuracy but take longer, the default is 1.5), and a minimum score for aligned reads of 30 (-T 30, where this is the phred scaled probability that the read is not correctly mapped). Other values are defaults (see the default options below). I am running the alignments in groups of 96 with 20 threads (job numbers 11670-11677).

BWA options and default values,

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Algorithm options:

-t INT number of threads [1]

-k INT minimum seed length [19]

-w INT band width for banded alignment [100]

-d INT off-diagonal X-dropoff [100]

-r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]

-c INT skip seeds with more than INT occurrences [10000]

-S skip mate rescue

-P skip pairing; mate rescue performed unless -S also in use

-A INT score for a sequence match [1]

-B INT penalty for a mismatch [4]

-O INT gap open penalty [6]

-E INT gap extension penalty; a gap of size k cost {-O} + {-E}*k [1]

-L INT penalty for clipping [5]

-U INT penalty for an unpaired read pair [17]

Input/output options:

-p first query file consists of interleaved paired-end sequences

-R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]

-v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]

-T INT minimum score to output [30]

-a output all alignments for SE or unpaired PE

-C append FASTA/FASTQ comment to SAM output

-M mark shorter split hits as secondary (for Picard/GATK compatibility)