Post date: Jan 30, 2020 2:22:2 PM
First, a language precision. I was kinda confused when we used somatic versus germline mutations. Will had offered to shift the language to "ancestral" and "de novo" mutations. Is it something you would agree too?
This is AC/AN for the SNPs I chose for de novo mutations, the unfolded graph. I thought that we needed to keep both >0.5 and <0.5 values as we do not know which allele is the ancestor allele. Is that correct? Or do we consider that the reference genome is the ancestral allele and thus we ignore the >0.5 values (this seems wrong to me)?
AC/AN: we know who is the reference, and thus there can be 0.9 or 0.1 values of heterozygosity.
# of hets: we count the number of 0/1 genotypes per line, and we do not look at which allele is considered (ancestral or mutation). values are all < 0.5
- pando_lowhets_50.vcf
vcf file with mutations that are less than 0.5 in frequency