Post date: Apr 14, 2016 2:43:0 AM
As these data were old, I first determined the quality encoding for each data set using DetermineFastqQualityEncoding.pl (this is a cool script I found on-line). The answer was Illumina 1.5+. I used seqtk to convert to 1.8+ (sanger_* files are converted):
seqtk seq -Q64 -V original.fastq > sanger.fastq
I then used splitFastq.pl to combine data sets and generate one fastq file per individual
Next I used bwa (version 0.7.10-r789) to index the reference genome (original L. melissa genome):
bwa index final.assembly.fasta
Alignments were then conducted with bwa (aln and samse) using a perl wrapper. Here is an example command:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_mappingfams/parsed/
bwa aln -n 4 -l 20 -k 2 -t 8 -q 10 -f alnm54p.sai /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_genome/final.assembly.fasta m54p.fastq
bwa samse -n 1 -r '@RG\tID:lyc-m54p\tPL:ILLUMINA\tLB:lyc-m54p\tSM:lyc-m54p' -f alnm54p.sam /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_genome/final.assembly.fasta alnm54p.sai m54p.fastq
Finally, I used bwa to compress, sort and index the alignments (example below). The results are in /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_mappingfams/alignments/:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_mappingfams/alignments/
samtools view -b -S -o alnm54p.bam alnm54p.sam
samtools sort alnm54p.bam alnm54p.sorted
samtools index alnm54p.sorted.bam