Post date: Sep 21, 2015 6:32:40 PM
We first ran the script removeIndels.pl on all of the aln*sam files to drop all reads that were not part of one of our haplotype loci or that contained an indel. This generated a noindel*sam alignment file for each individual. Note that we still want to know the indel count per haplotype locus, and a script findIndelsSam.pl is running now to count these. But for now, we are moving forward with the nonindel*sam files.
We used samtools to convert these sam files to bam files (here is an example command for one individual, note that we are re-headering the samfiles as the header was not retained from before):
cd /home/A01963476/data/aspen/gbs/Assemblies/
samtools view -H /labs/evolution/data/aspen/gbs/Assemblies/alnUSF_4701_D.sorted.bam > headerUSF_4701_D
cat headerUSF_4701_D /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sam > tempUSF_4701_D.sam
samtools view -b -S -o /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.bam tempUSF_4701_D.sam
samtools sort /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.bam /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sorted
samtools index /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sorted.bam
The file indelCnts.txt now contains the number of reads with indels (column 3) and number of reads total (column 4) for each locus, but I still need to summarize it.