I have to do alignments for the fastq files I downloaded from the cluster.
The fastq files are here : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles
The samfiles are here : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/
The whole genome alignment is here :
/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/re_mod_map_timema_06Jun2016_RvNkF702.fasta
I modifies Zach's wrapper perl script to run the bwa alignments.
This script is saved as: /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/scripts/wrap_qsub_slurm_bwa.pl.
Usage: perl wrap_qsub_slurm_bwa.pl *.fastq
This script will create sam files from the fastq files. These samfiles will be used for conversion to bam files and then variant calling.
Here are commands and parameters I used for alignment:
aln bwa aln -n 4 -l 20 -k 2 -t 8 -q 10 -f aln"."$file".".sai $genome $file\n";
OPTIONS:
-n Maximum edit distance if the value is INT, or the fraction of missing alignments given 2% uniform base error rate if FLOAT. In the latter case, the maximum edit distance is automatically chosen for different read lengths. [0.04]
-l Take the first INT subsequence as seed. If INT is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for ‘-k 2’. [inf]
-k Maximum edit distance in the seed [2]
-t Number of threads (multi-threading mode) [1]
-q Parameter for read trimming. BWA trims a read down to argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT where l is the original read length. [0]
-f
samse bwa samse -n 1 -r '\'\@RG\\tID:SRR\\tSM:SRR' -f aln"."$file".".sam $genome aln"."$file".".sai $file\n";
OPTIONS:
-n INT
-r STR
Maximum number of alignments to output in the XA tag for reads paired properly. If a read has more than INT hits, the XA tag will not be written. [3]
Specify the read group in a format like ‘@RG\tID:foo\tSM:bar’. [null]