Post date: Jun 04, 2016 3:41:34 AM
I am using a modified version of estpEM (estpEMsex) to estimate population allele frequencies as a precursor to estimating genotypes and ancestry frequencies. This version assumes females are haploid.
1. Split sex_lychybanc_notvryrare.gl by population (results are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/variants/bypop/). There are 852 Z chromosome SNPs.
perl splitPopsSex.pl ../sex_lychybanc_notvryrare.gl
2. Run estpEMsex for all populations, uses a perl wrapper.
perl runEstpEm.pl lychybSex_*gl
Here is an example,
estpEMsex -i lychybSex_YBG.gl -o p_lychybSex_YBG.txt -e 0.001 -m 20 -h 2
I then moved the allele frequency files to /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/popfreqs/
3. Genotype point estimates (posterior means) were then inferred using the gl files with HWE priors based on the allele frequencies. I wrote a perl wrapper script for this around gl2genestSex.pl. Note that this version of gl2genest.pl also does some formatting for popanc. Note also that a simpler prior (p) is used for females (haploids). Also pairs of females are combined into single individuals and sample sizes are adjusted accordingly. This might seem weird, but I think it is actually fine. I am basically "mating" two female Z chromosomes.
perl runGenest.pl lychybSex_*gl
Which runs this (as an example),
perl gl2genestSex.pl lychybSex_YBG.gl p_lychybSex_YBG.txt
The pntest* files are here: /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/popfreqs.
4. Initial analyses of ancestry frequencies for Z chromsome with popanc. Results and files are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/ancfreqs/. First I combined sets of populations as I did for the autosomes:
perl combinePops.pl lychybSex_MelWest.txt pntest_lychybSex_BHP.txt pntest_lychybSex_GVL.txt pntest_lychybSex_REW.txt pntest_lychybSex_SLA.txt pntest_lychybSex_VCP.txt pntest_lychybSex_WAL.txt
perl combinePops.pl lychybSex_MelNorth.txt pntest_lychybSex_DCR.txt pntest_lychybSex_GLA.txt pntest_lychybSex_LCA.txt pntest_lychybSex_OCY.txt pntest_lychybSex_SSC.txt pntest_lychybSex_SUV.txt
perl combinePops.pl lychybSex_Anna.txt pntest_lychybSex_DOP.txt pntest_lychybSex_FCR.txt pntest_lychybSex_LKS.txt pntest_lychybSex_YBG.txt pntest_lychybSex_CPE.txt
perl combinePops.pl lychybSex_Idas.txt pntest_lychybSex_STB.txt pntest_lychybSex_KHL.txt pntest_lychybSex_SDC.txt pntest_lychybSex_SYC.txt
Next, I ran popanc for the following sets of populations. The call to the perl wrapper and one example are given below (these are the same conditions I used for the autosomes):
my @hpops = ("CLH","CRP","LAE","SWM","SOF","REF","EGP","STM","MTR","SOP","TIC","CSP","BKM");
perl ../../scripts/wrap_qsub_slurm_popanc.pl pntest_lychybSex_*
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/ancfreqs/
popanc -o /scratch/general/lustre/u6000989/outSexBKM4.hdf5 -n 5 -d 15 -m 50000 -b 25000 -t 20 -s 1 -q 1 -z 0 infiles/comb_lychybSex_Anna.txt infiles/comb_lychybSex_MelWest.txt infiles/pntest_lychybSex_BKM.txt
mv /scratch/general/lustre/u6000989/outSexBKM4.hdf5 ./