Post date: Sep 03, 2013 8:45:55 PM
Engelhart and Stephens (2010; doi:10.1371/journal.pgen.1001117) propose sparse factor analysis as an alternative to admixture proportions or pca to summarize population structure. This is an alternative way to decompose and summarize a genotype matrix (in this case they use the genotype matrix, not the genotype covariance matrix). They include software for this analysis (sfa_linux, in /usr/local/bin/). There are a few options I don't really understand and I simply ran the program with k = 3 factors and the common genotype data, as follows:
sfa_linux -gen gmat.txt -g 1521 -n 15076 -k 3 -iter 50 -rand 433 -o out
I then used R to plot the results of the loading matrix (I think), as follows:
## sfa
nloci<-15076
nind<-1521
sfa<-read.table("out_lambda.out",header=F)
## plot pca common
a<-2
b<-1
mycolors<-c("orange","orangered","forestgreen","rosybrown","gold","darkblue","lightblue","salmon","black","gray","brown","violet","darkred")
leginfo<-read.table("../admixprops/results/legend.txt",header=F)
pdf("sfaplot.pdf",width=16,height=8)
par(mfrow=c(1,2))
plot(sfa[,a],sfa[,b],pch=20,cex=0.5,type='n',xlab="factor 2",ylab="factor 1",cex.lab=1.3)
for(i in 1:13){
A<-which(leginfo[,2]==i)
text(sfa[A,a],sfa[A,b],leginfo[A,3],cex=0.6,col=mycolors[i])
}
a<-3
plot(sfa[,a],sfa[,b],pch=20,cex=0.5,type='n',xlab="factor 3",ylab="factor 1",cex.lab=1.3)
for(i in 1:13){
A<-which(leginfo[,2]==i)
text(sfa[A,a],sfa[A,b],leginfo[A,3],cex=0.6,col=mycolors[i])
}
legend(1.7,0.15,legend=c("anna","ricei","idas","long","sublv","mel-gb","mel-rm","mel-an","sr-nev","white","warner","jhole","dubs"),fill=mycolors,cex=0.7)
dev.off()
The sparse factors do not quite correspond to pc's (pc 1 is similar to factor 2, and factors 2 and 3 are highly correlated), but the overall pattern is very similar (sfa plot). I am not sure whether this is worth pursuing more with the current data set (given the similarity to pca, we don't really learn anything new), but if so I need to read more.