Post date: Nov 05, 2019 3:13:55 AM
As a first look at population structure and diversity patterns, we inferred allele frequencies and genotypes for the 39,164 SNPs described here. Everything below is documented in EstAfreqGen.sh. Note also that this is all within /uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/gbs_pando_plus/Variants_mem_bcftools/.
1. Allele frequencies were estimated with the EM algorithm from Li implemented in estpEM.
estpEM -i filtered2xHiCov_pando_variants.gl -o p_pando_plus.txt -h 1
Number of loci: 39164
Number of individuals: 296
Using EM algorithm to estimate allele frequencies
Writing results to p_pando_plus.txt
Runtime: 0 hr 0 min 8 sec
## extract column 3, this is the MLE allele freq. estimate
cut -f 3 -d " " p_pando_plus.txt > mle_p_pando_plus.txt
2. Obtain point estimates (mean of the posterior) for genotypes applying allele-frequency based priors:
perl gl2genest.pl filtered2xHiCov_pando_variants.gl mle_p_pando_plus.txt
## makes this, genotye point estimate
#pntest_filtered2xHiCov_pando_variants.txt
A quick peek at the PCA looks promising.