redoingvariantcalling_bwamem

bwa aln-samse was not mapping things well. Therefore, I am redoing the alignments using bwa-mem. First I tried testing with a single individual from bart as these had lower number of SNPs compared to cris.

Here is a comparison of number of mapped vs unmapped reads based on different methods

1). bwa aln edit distance 4

using samtools view F4, f4 to get mapped and unmapped reads

Example sam file : alnSRR5094939_1.sam

mapped reads = 75400

unmapped reads = 356672

2) bwa aln increasing the edit distance to 5

bwa aln -n 5 -l 20 -k 2 -t 8 -q 10 -f alnSRR5094939_1.sai ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq

bwa samse -n 1 ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta alnSRR5094939_1.sai SRR5094939_1.fastq

mapped reads = 95445

unmapped reads = 336627

3). trying bwa mem (defaults here)

bwa mem ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq > memSRR5094939_1.sam

mapped = 228133

unmapped = 206707

4). trying bwa aln with options set from Zach's example

bwa mem -t 12 -k 15 -r 1.3 -T 30 ../../fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta SRR5094939_1.fastq

mapped = 246947

unmapped = 188103

So now I am redoing alignment using bwa mem...therefore all downstream stuff will have to change.

The script is /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/scripts/wrap_qsub_slurm_bwa_mem.pl

I copied the script to the fastqfile folder and ran the script from there.

Usage: perl wrap_qsub_slurm_bwa_mem.pl *.fastq

Here is the summary of the script:

my $dir = '/uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles/';

my @jobarray;

my $aln;

#my $samse;

my $ind;

my $genome = "/uufs/chpc.utah.edu/common/home/u6000989/projects/timema_adaptation/fasta_genome/re_mod_map_timema_06Jun2016_RvNkF702.fasta";

my $cd = "cd $dir\n";

#myregex = (^SRR)+\d+\_1.fastq

foreach my $file (@ARGV){

if ($file =~ m/(^SRR)+\d+\_1.fastq/g){

$ind = $1;

print "$file";

}

else {

die "Failed to match $file\n";

}

$aln ="bwa mem -t 12 -k 15 -r 1.3 -T 30 $genome $file > mem"."$file".".sam \n";

push (@jobarray, "$cd"."$aln");

print "$cd"."$aln";

}

printf "Ready to submit %d jobs to slurm (y/n): ", scalar @jobarray;

my $response = <STDIN>;

chomp $response;

if($response eq 'n'){

print "Exiting without any slurm submissions.\n";

exit;

}

else{

print "Proceeding with slurm submission\n";

}

This script will create output sam files memSRR*_1.fastq in the folder /uufs/chpc.utah.edu/common/home/u6007910/projects/timema_adaptation/alignments/fastqfiles/. I will have to move the files to a new samfiles folder. Right now all the sam files for all the species will be in this folder. I will need to put them into separate folders for each species for further analysis.

On December 14th

I made a mistake in the script so I am rerunning the alignments. They are now submitted to the cluster. Once it is done, I will have to move the files to different folders for each species.

Page updated

Google Sites

Report abuse