Post date: Jun 03, 2020 3:26:45 PM
Our goal is to separate the somatic mutations (mutations acquired during lifetime) from the germline mutations (mutations shared by every individual).
First, we need to understand the data are not perfect.
That means that if mutation A is present in only 15% of the individuals, this may be (a) because it is an "acquired mutation" - the ones we want to keep - or (b) a mutation that was detected for these trees only while they actually are in every tree, however the other tree's segments were not well amplified.
Thus if we keep mutations that we think are somatic but are actually a result of non-homogeneous amplification, we are making a mistake.
This is why we use a "control group". The group surrounding Pando may have been derived from a closely related seed (they are in the same canyon). If we see a mutation in this group, it may also be present in Pando as a germline mutation. We may drop "real" somatic mutations by dropping the common mutations between Pando and Friends, but we also hope to be dropping germline mutations that had not been well amplified. There is a trade-off but less harm in dropping "real" somatic mutations than keeping "false" ones.