Post date: Sep 17, 2013 8:38:34 PM
I want to estimate admixture-class proportions (i.e, Q matrix elements) for subsets of the Lycaeides data where there is evidence of admixture. The idea is to simplify population structure by considering these subsets, to consider only k = 2 or maybe 3, and only common variants. There are six groups of populations I want to analyze:
anna; melw; alp
anna; mele; idas
mele; melr; idas; long; dubs
anna; mele; warn
anna; idas; mele; warn
long; melr; dubs
Group descriptions are given in the attached file, codes.txt.
The infiles and scripts for this analysis are in projects/lycaeides_admixture/Qmatrix/. As with previous entropy analyses I generated initial values for q (which are also used to initialize Q) using PCA, k-means clustering, and lda (I used the first 3 pc's and ran all steps separately for each set of populations, i.e., groups 1-6). The R code can be found here. The starting values are in the files g#k#.txt (#'s are group number, as above, and k). I then wrote and used a subsetInfile.pl script to grab the subset of individual in each group and write new genotype likelihood infiles. The sample sizes (# inds.) for each group are: g1 = 505, g2 = 435, g3 = 762, g4 = 393, g5 = 532, g6 = 420. These all include 15069 common variants.
The entropy analyses are running on the dorc cluster. I used the following options 15000 steps with a 5000 step burnin and thinning interval of 5. The scalar for starting values was 50. The job number are 14527-14550 and all jobs are currently running. The results will be in scratch/admixQ.