On October 16th, 2017
To determine the total number of reads in the study and number of mapped reads in the study, I used samtools flagstat to get the count of total and mapped reads for each bam file in the folder.
/uufs/chpc.utah.edu/common/home/u6007910/projects/lmelissa_hostAdaptation/variantcalling/alnBamfiles
Here is the unix for loop to save the output for each bam file in a file
for f in *sorted.bam; do samtools flagstat $f; done >> out.txt
The python script to determine the total number of reads across all bam files is called findreadcounts.py and is saved in the same folder. The script prints out the sum of all read counts (total and mapped) by adding each of the numbers in a sequence. The FINAL number printed out is the total number of reads across all bam files.
***************************************************************************************************************************************
I redid variant calling for the treemix analysis as the previous files had errors with the L. anna data. Moving on...
Here is the path to the directory on UofU cluster with all the files:
/uufs/chpc.utah.edu/common/home/u6007910/projects/lmelissa_hostAdaptation/variantcalling
The folder in the directory (and their descriptions) is as follows:
alnBamfiles = contains bam files for L. melissa and L. anna. These are symbolically linked to the files in this folder: source the Lmelissa bam files :/uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesHost .........source the L anna bam files :/uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesAnna
fastafiles = contains the fasta files and assemblies
locuslist.txt = list with scaffold and position for all the 206028 SNPs
melVariantCalling.sh = bash script to do variant calling
To do this I first made a symbolic link of the bam files from a folder like this:
cd variancalling
#for L. melissa files
ln -s /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesHost
#for L. anna files
ln -s /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesAnna
Once I had these files I needed the older version of samtools and bcftools installed (version 0.1.19) installed.
I used samtools and bcftools version 0.1.19 for variant calling. Refer to variantcalling_zg'snotes to see specific descriptions of all the options variant calling.
Here are the options I used:
samtools0.1.19 mpileup -C 50 -d 50 -l locuslist.txt -f /uufs/chpc.utah.edu/common/home/u6007910/projects/lmelissa_hostAdaptation/variantcalling/fastafiles/final.assembly.fasta -q 10 -Q 15 -D -g -I aln*.bam > variantsMelissaAnnaWild.bcf
bcftools0.1.19 view -c -d 0.8 -g -p 0.01 -P full -l locuslist.txt -v variantsMelissaAnnaWild.bcf > variantsMelissaAnnaWild.vcf
Remember that I used locuslist here because I wanted to call specific set of SNPs which were called in the experiment. Otherwise I do not need a locuslist for samtools mpileup.
Number of SNPs:
variantsMelissaAnnaWild.vcf = 206,028
#after converting to gl
LmelissaVariants_tm.gl = 186,937 (similar to baypass vcf to gl script, removing sites which are non-variant).
I did not use the selection experiment bam files in this variant calling. Therefore, there is a difference in the number of SNPs in this file and the variant calling file from Zach used for baypass in which he used the selection experiment data too.