Post date: Jul 08, 2020 1:11:16 PM
BEAST
software that outputs time-calibrated phylogenies
We do not know at the moment if what kind of data the software can accept: we only have the variable sites (ex: 010011001 per individual where 1s indicate the presence of the mutation) - maybe the software is designed for full sequence in which case we will need to add a bunch of 0s.
One first task for me is to define: what data format does BEAST acccept? TASK 1
The model we choose will depend on the kind of data it is taking.
There are two major kinds of models: relaxed versus fixed molecular clock. Fixed assumes constant mutation rate on the whole tree during the whole time. Relaxed allow variations in the mutation rate over time/branches.
We can add constrains (maximum age for example) to increase accuracy.
We do not care about indels because the time scale (approx 20 000 years is too short). We do not care about possible reticulations because there is no recombination event (clonal reproduction).
A second task is: download and learn about the software, and try some examples. TASK 2
The output of BEAST is a time-calibrated phylogeny - we will have a mutation accumulation per branch length but we need to know the number of total sites (effective sites) to be able to scale this rate so that it is "rate of mutation per base". We need to find the number of effective sites.
We need to understand that we do not have 100% confidence in the sites we read: there can be sites where there are only 2 reads so we are actually not sure we read it.
We can use two approaches for this:
1 - hard thresholding: choose a threshold and keep only the sites that have more than 20 reads for example.
2 - take into account the uncertainty in the data and use ANGSD to calculate an index that will give us the number of effective sites as part of the output. This technique is based on the bam files and use the GL (incorporates uncertainty).
TASK 3 - learn about ANGSD and make the calculations for Pando to obtain the number of effective sites.
Finally we can also think the tree in terms of time/lineage diversification. We can see the reproduction of ramets as speciation event (a branch occurs). Speciation event = birth of a ramet/ extinction event = death of a ramet. It uses the tree we obtain from BEAST as an input and find the rate of diversification (?). I am not sure to totally understand this last point. TASK 4 - read the paper on diversification Zach sent me.