Post date: Jun 06, 2020 10:42:23 PM
# first easy dirty way
get rid of P(Z=1)=1 and P(Z=2)=0 n IBD file from PLINK (seems that their default for missing values is to give 0.5 probability value to be IBD)
Well no - this would just mean choosing the data to get rid of while I try to find why I am having this data.
# second not easy way
filter the vcf to keep SNPs that have at least X called genotypes
There are 295*2=590 alleles to be found max
I start of with 39164 SNPs.
If I filter for SNPs presenting at least 450 calls, how much are left?
203981 ../7-IBD_Pando_Friends/filtered2xHiCov_pando_variants.vcf - There are 203981 total lines (with header)
203981 pando_ friends_AN_450.vcf (with header) -- without header --> 203981 lines
164834 pando_friends_AN_590.vcf (with header) -- without header --> 203981- 164834 = 39147 which is the total number of SNPs!
Try a threshold in between:
199474 pando_friends_AN_500.vcf(with header) -- without header --> 203981- 199474 = 4507 removed (a quarter of the total SNPs)
183375 pando_friends_AN_550.vcf(with header) -- without header --> 203981-183375= 20606 removed (half of the total SNPs)
Calculate IBD based on these files and plot to see if the 0.5bar decreases.
The 0.5 bar did not decrease, it was rather the opposite.
I think to see this bar decrease I would need to go back to the friend-only vcf file and delete individuals that have a very low coverage.