Here are the steps I followed to redo the alignments using bwa mem and then finally doing variant calling.
STEP 1 bwa alignment
In this step we are aligning the fastqfiles to the fasta genome and the end result is sam files. I used the bwa mem command to do align these files to the dovetail whole genome of timema. To do this I used the perl wrapper script wrap_qsub_slurm_bwa_mem.pl saved in the same folder.
I did these alignments for the fastq files I downloaded from the cluster.
The fastq files are here : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles
The samfiles are here : /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/samsaifiles
The whole genome alignment is here :
/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/re_mod_map_timema_06Jun2016_RvNkF702.fasta
I modified Zach's wrapper perl script to run the bwa alignments (the mains cript uses bwa aln to do alignments).
This script is saved as: /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/scripts/wrap_qsub_slurm_bwa_mem.pl.
Usage: perl wrap_qsub_slurm_bwa_mem.pl *.fastq
This script will create sam files from the fastq files. These samfiles will be used for conversion to bam files and then variant calling.
Here are commands and parameters I used for alignment:
bwa mem -t 12 -k 15 -r 1.3 -T 30 $genome $file > mem"."$file".".sam \n
OPTIONS:
-t = Number of threads [1]
-k = Minimum seed length. Matches shorter than INT will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20. [19]
-r = Trigger re-seeding for a MEM longer than minSeedLen*FLOAT. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy. [1.5]
-T = Don’t output alignment with score lower than INT. This option only affects output. [30]
$ genome = genome file
$file = fastq input file
> = output file.sam
STEP 2 index and sort
In this step I am using the samfiles generated in previous steps and converting them to bam format (basically a simpler format).
I have samfiles for each species saved in the folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/samsaifiles/mem/. I used samtools to do this and here is the samtools command I used:
samtools view -b -S -o $out"."bam $file
samtools sort $out"."bam -o $out"."sorted.bam
samtools index $out"."sorted.bam
To do this I used the perl wrapper script wrap_qsub_slurm_sam2bam.pl saved in the same folder.