Post date: Sep 17, 2017 7:24:46 PM
There was an error with the barcodes for Freddie's Hwy 154 cline data. Thus, I am re-processing everything as described here. I have the new fastq files and have deleted all of the relevant old files.
Notes on a few of the steps, particularly in the context of the number of SNPs at different stages (relative to the link with more details) are below:
## filtering of variants, with numbers reatined
perl ../scripts/vcfFilter.pl tcrHwy154Vars.vcf
Finished filtering tcrHwy154Vars.vcf
Retained 92106 variable loci
grep ^Sc filtered2x_tcrHwy154Vars.vcf | cut -f 8 | cut -f 1 -d ";" | perl -p -i -e 's/DP=//' > depth.txt
a<-scan("depth.txt")
#Read 92106 items
mean(a)
# 1756.358
sd(a)
# 1377.528
mean(a)+3 * sd(a)
# 5888.943
grep ^Sc filtered2x_tcrHwy154Vars.vcf | cut -f 1,2 | perl -p -i -e 's/:\S+//' | perl -p -i -e 's/^S[A-Za-z_]+//' > snpids1.txt
perl ../scripts/filterSomeMore.pl filtered2x_tcrHwy154Vars.vcf
checking neighbors for 92106 SNPs
Finished filtering filtered2x_tcrHwy154Vars.vcf
Retained 72199 variable loci
perl ../scripts/vcf2gl.pl 0.05 morefilter_filtered2x_tcrHwy154Vars.vcf
Number of loci: 28838; number of individuals 472