bwa aln-samse was not mapping things well. Therefore, I am redoing the alignments using bwa-mem. First I tried testing with a single individual from bart as these had lower number of SNPs compared to cris.
Here is a comparison of number of mapped vs unmapped reads based on different methods
1). bwa aln edit distance 4
using samtools view F4, f4 to get mapped and unmapped reads
Example sam file : alnSRR5094939_1.sam
mapped reads = 75400
unmapped reads = 356672
2) bwa aln increasing the edit distance to 5
bwa aln -n 5 -l 20 -k 2 -t 8 -q 10 -f alnSRR5094939_1.sai ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq
bwa samse -n 1 ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta alnSRR5094939_1.sai SRR5094939_1.fastq
mapped reads = 95445
unmapped reads = 336627
3). trying bwa mem (defaults here)
bwa mem ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq > memSRR5094939_1.sam
mapped = 228133
unmapped = 206707
4). trying bwa aln with options set from Zach's example
bwa mem -t 12 -k 15 -r 1.3 -T 30 ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq
mapped = 246947
unmapped = 188103
So now I am redoing alignment using bwa mem...therefore all downstream stuff will have to change.
The script is /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/scripts/wrap_qsub_slurm_bwa_mem.pl
I copied the script to the fastqfile folder and ran the script from there.
Usage: perl wrap_qsub_slurm_bwa_mem.pl *.fastq
Here is the summary of the script:
my $dir = '/uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles/';
my @jobarray;
my $aln;
#my $samse;
my $ind;
my $genome = "/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_adaptation/fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta";
my $cd = "cd $dir\n";
#myregex = (^SRR)+\d+\_1.fastq
foreach my $file (@ARGV){
if ($file =~ m/(^SRR)+\d+\_1.fastq/g){
$ind = $1;
print "$file";
}
else {
die "Failed to match $file\n";
}
$aln ="bwa mem -t 12 -k 15 -r 1.3 -T 30 $genome $file > mem"."$file".".sam \n";
push (@jobarray, "$cd"."$aln");
print "$cd"."$aln";
}
printf "Ready to submit %d jobs to slurm (y/n): ", scalar @jobarray;
my $response = <STDIN>;
chomp $response;
if($response eq 'n'){
print "Exiting without any slurm submissions.\n";
exit;
}
else{
print "Proceeding with slurm submission\n";
}
This script will create output sam files memSRR*_1.fastq in the folder /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles/. I will have to move the files to a new samfiles folder. Right now all the sam files for all the species will be in this folder. I will need to put them into separate folders for each species for further analysis.
On December 14th
I made a mistake in the script so I am rerunning the alignments. They are now submitted to the cluster. Once it is done, I will have to move the files to different folders for each species.