Post date: Apr 21, 2017 2:31:11 PM
I am running some analyses for Maria on a hybrid zone involving two frogs from central America. The project is here: /uufs/chpc.utah.edu/common/home/u6000989/projects/frogs/
1. She sent me the variants.gl file in the variants sub-directory. I used splitPops.pl to split this into 15 population files.
2. I ran estpEM via runEstp.pl from the variants sub-directory to obtain ML allele frequency estimates for each population and for the full data set. I used the default options with tolerance of 0.001 and a maximum of 20 iterations. The estimates for all 103825 SNPs are in the freqs sub-directory and start with p_*.
3. I used the orderFreqs.R script to generate a single combined allele frequency files, frog_pop_frequencies.txt. There is one row per SNP in the same order they were in and one column per order (ordered by your Geo column, see sortPops.txt). I gave this file to Maria.
4. I generated a set of common variants for entropy by running getCommon.pl on variants.gl which yielded common_variants.gl. This includes 39,018 common (MAF > 5% based on all samples) SNPs.
5. I then sub-set this to retain a random SNP per contig (i.e., "unlinked" SNPs). This was done by first selecting a random subset of SNPs in R:
x<-as.matrix(read.table("commonSnps.txt",header=F))
uni<-unique(x[,1])
nc<-length(uni)
keep<-rep(NA,15444)
for(i in 1:nc){
a<-which(x[,1]==uni[i])
if(length(a) > 1){
keep[i]<-sample(a,1)
}
else{
keep[i]<-a
}
}
write.table(keep,"commonSubToKeep.txt",row.names=F,col.names=F,quote=F)
And then by running getKeep.pl on common_variants.gl to generate sub_common_variants.gl, which contains 15,444 SNPs. This is the file I will use for entropy.