removing indels, recreating bam files, and extracting haplotype data

Post date: Sep 21, 2015 6:32:40 PM

We first ran the script removeIndels.pl on all of the aln*sam files to drop all reads that were not part of one of our haplotype loci or that contained an indel. This generated a noindel*sam alignment file for each individual. Note that we still want to know the indel count per haplotype locus, and a script findIndelsSam.pl is running now to count these. But for now, we are moving forward with the nonindel*sam files.

We used samtools to convert these sam files to bam files (here is an example command for one individual, note that we are re-headering the samfiles as the header was not retained from before):

cd /home/A01963476/data/aspen/gbs/Assemblies/

samtools view -H /labs/evolution/data/aspen/gbs/Assemblies/alnUSF_4701_D.sorted.bam > headerUSF_4701_D

cat headerUSF_4701_D /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sam > tempUSF_4701_D.sam

samtools view -b -S -o /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.bam tempUSF_4701_D.sam

samtools sort /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.bam /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sorted

samtools index /labs/evolution/data/aspen/gbs/Assemblies/noindel_alnUSF_4701_D.sorted.sorted.bam

The file indelCnts.txt now contains the number of reads with indels (column 3) and number of reads total (column 4) for each locus, but I still need to summarize it.

Page updated

Google Sites

Report abuse