Post date: Feb 04, 2014 10:46:36 PM
Pickrell and Pritchard recomend using the proportion of the variance in population covariances explained by the population graph (PVE) to compare models with different number of migration events. Here is the text from their paper describing how this is done:
Measuring model fit.
Finally, we wanted to define measures for how well the model fits the data. First, we define the matrix of residuals in this model, . These quantities are useful for visualization and fitting:
(29)
Positive residuals indicate pairs of populations where the model underestimates the observed covariance, and thus populations where the fit might be improved by adding additional edges. Negative residuals indicate pairs of populations where the model overestimates the observed covariance; these are a necessary outcome of having positive residuals, but can also sometimes be interpreted as populations that are forced too close together due to unmodeled migration elsewhere in the graph. These residuals can be used to define a measure of the fraction of the variance in
that is explained by . Let us call this fraction :(30)
where
and . This quantity approximates the fraction of the variance in relatedness between populations that is accounted for by the model.
And here is how this can be done with R using their plotting functions:
source("~/Downloads/treemix-1.12/src/plotting_funcs.R")
First read the matrix of residuals:
R<-plot_resid("trmxout_mle_rootanri","poplist.txt")
Then read the estimated covariance matrix
W<-plot_cov("trmxout_mle_rootanri","poplist.txt")
Calculate Rbar and Wbar:
n<-66
Rbar<-sum(R[upper.tri(R)])/(n*(n-1)/2)
Wbar<-sum(W[upper.tri(W)])/(n*(n-1)/2)
Calculate PVE:
num<-sum((R[upper.tri(R)]-Rbar)^2)
dnom<-sum((W[upper.tri(W)]-Wbar)^2)
pve<-1-(num/dnom)