Post date: Sep 09, 2013 11:23:47 PM
I need a quantitative, objective way to identify linkage groups from pairwise recombination rates. One idea I had was to identify the main axes of variation in the pairwise recombination rates with pca. These could then be used as input for k-means clustering, and these clusters could be verified with lda (similar to generating initial values for q for entropy). I tried this (see code below), and it seems to work alright (most scaffolds are assigned to a cluster or linkage group with high probability and most cluster have low average recombination rates), but some groups identified with this procedure appear to be grab-bags of scaffolds with high recombination rates (i.e., scaffolds with no good linkage group). I have more to do, but this is promising.
outpca<-prcomp(dmatf2[long,long],center=T)
outk<-kmeans(x=outpca$x[,1:26],26,iter.max=1000,nstart=100,algorithm="Hartigan-Wong")
outlda<-lda(x=outpca$x[,1:26],grouping=outk$cluster,CV=TRUE)
lg<-as.numeric(as.character(outlda$class))
maxlg<-apply(outlda$posterior,1,max)
lg[maxlg<0.95]<-NA