I'm pleased to announced that our study on a body mass-corrected molecular rate for birds is just published in Molecular Ecology.
In this study we wanted to provide an alternative to the classical 2% of divergence per millions years that is widely used but yet could be a bad approximation of the actual rate of evolution in some birds lineages.
The idea is simply to use the relationship between body mass and molecular rate, inferred using a genome wide datasets of more than 450 species, to be able to predict the molecular rate from species's body mass in any birds taxa. As you can see on the left, the relationship is pretty strong.
Here, I would like to provide quick guidelines to apply our simple method.
1) The linear regression have been estimated using 3rd codon position. This is to limit the confounding effect of natural selection. So you need to extract the 3rdcodon position from the alignment. Do to that you can use a simple R script :
library(ape)
alignment<-read.dna("yourAlignment.fasta", format="fasta", as.character=TRUE)
Thirdpos<-alignment[,1:length(alignment[1,])%%3 == 0]
write.dna(Thirdpos, "3rdCodon.fasta", format = "fasta")
The species analysed should have a limited molecular divergence. An arbitrary threshold could be a pairwise molecular divergence below 1.0 subst/site for the third-codon position, which corresponds to a divergence of 0.3-0.4 subst/site (that is 30 to 40% divergence) for all codon position. When divergence is higher, we recommend splitting the dataset into smaller clades and analysing each clade independently.
2) Compute the species's average body mass. Body mass is available for many birds species here or here. The within-clade body mass should be rather homogeneous among the species of a clade (this is likely when analysing clades of limited molecular divergence). This will help to ensure that applying a single correction factor the molecular branch-lengths is warranted. If body mass varies widely within a clade, the substitution rate is unlikely to be accurately approximated using a single molecular clock estimated from the average body mass of that clade. In this case, we also recommend splitting the dataset into monophyletic clades of species with relatively homogeneous body mass, and analysing each clade independently.
3) Compute the molecular rate using the linear regression parameters. We estimated two models reflecting the current uncertainty surrounding the birds' fossil records. We recommend to use the two rates that will provide a range of divergence dates.
The molecular rate for the 3rd codon position is obtained as follow:
Substitution rate 1 = 10^(-0.145*log10(body mass in gram)+0.459)/100 (unit is subst/site/Myr)
Substitution rate 2 = 10^(-0.247*log10(body mass in gram)+0.813)/100 (unit is subst/site/Myr)
Note also that in table 2 of the manuscript, we provide confidence interval made using 95% C.I. of molecular dating analyses.
4) Convert branch lengths in divergence dates. There are several ways to do that. First, you can compute the branch length of your dataset using a strict molecular clock model. You can do it using baseml. This is what we have done from the working examples provided in the manuscript (available in Dryad, including baseml control file). Once you have estimated the branch length following a molecular clock, you can use this R script (available in Dryad) to convert branch lengths in divergence dates.
Alternatively, you can input the substitution rate using the ‘clock.rate’ parameter in strict molecular clock model implemented in BEAST (see this tutorial for details).
Please e-mail me if you have questions!