Post date: Feb 03, 2020 5:52:19 PM
script:
/uufs/chpc.utah.edu/common/home/u6028866/data/Pando/variants/pando_only_variants/scripts/compare_two_vcf.py
Take in two vcf files, file 1 and file 2, and outputs a vcf file of lines of file 1 not found in file 2.
Stringent filter
I used this code to filter the SNPs found in both Pando and Pon (9745 - 1448 = 8297 SNPs)
and then in both Pando and friends (8297 - (1003-421) = 7715 with 421 = intersection between pon and friends).
file called: pando_50_stringent_filter.vcf
Non-stringent filter:
Then I delete from the Pando only vcf the intersection of both SNPs shared by Pando, PON and friends (9745 - 421 = 9324).
file called: pando_50_non_stringent_filter.vcf
Allele frequency estimation from AC/AN:
script:
/uufs/chpc.utah.edu/common/home/u6028866/data/Pando/variants/pando_only_variants/scripts/get_allele_freq.py
file called: AC_AN_Pando_non_stringent_filter.txt
file called: AC_AN_Pando_stringent_filter.txt
I have my less than 50% heteroZ that is only found in Pando!
Compare observed and expected heterozygosity:
script:
/uufs/chpc.utah.edu/common/home/u6028866/data/Pando/variants/pando_only_variants/scripts/count_hets.py
Counts observed number of hets in vcf files:
file names: Hets_non_stringent_filter.txt and tot_non_stringent_filter.txt
file names: Hets_stringent_filter.txt and tot_stringent_filter.txt
We see there is a slightly higher observed number of heterozygotes than expected using the Hardy Weinberg equilibrium formula, for diploids.
If we adjust the calculation taking into account the triploidy:
p becomes : AN + 1/2*AN --> three copies of each chromosome
(work in progress)