I ran the first trial run with popanc. This was just to test if popanc runs properly and see the time it takes to run the program. The program ran pretty quickly. In this run, I used SYC as the ids parent (infile 0), BST as the Melissa parent (infile 1) and used DBS as the admixed population. The infiles 0 and 1 are the parental populations that we give for the admixed population. Eventually I will have to switch this and make it more real in the sense that I might end up combining idas and Melissa populations based on the PCA and even divide idas into new and old hybrids.
Here is the shell script I used to run popanc. The files are saved in the folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles/
To run this program I will use the lyc_genoest_*.txt files which genotype estimate averages across Ks (and across chains)for each population. Notice the arguments given in the shell script for further information on mcmc chains, burnin etc.
#!/bin/sh -f
#SBATCH --time=36:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert
#SBATCH --partition=kingspeak
#SBATCH --job-name=popanc
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=samridhi.chaturvedi@gmail.com
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $SLURM_JOB_NODELIST
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load gcc
module load gsl
module load hdf5
cd /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/pre_popanc/
/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ./outPopanc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./lyc_genoest_BST.txt ./lyc_genoest_SYC.txt ./lyc_genoest_DBS.txt
early()
{
echo ' '
echo ' ############ WARNING: EARLY TERMINATION ############# '
echo ' '
}
exit
Convert hdf5 file to txt file with posterior estimates:
/uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o outPopanc_trial1.txt -p popQ -s 0 -I outPopanc.hdf5
Here are some more initial runs which I am running right now:
I created combined population files using the combinePops.pl script(/uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles)
Outfiles(hdf5): /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/outfiles
#Note the populations below which have been used to create the files:
perl combinePops.pl old.txt lyc_genoest_SDC.txt lyc_genoest_SYC.txt lyc_genoest_KHL.txt
perl combinePops.pl new.txt lyc_genoest_RDL.txt lyc_genoest_BLD.txt lyc_genoest_RNV.txt
perl combinePops.pl melissa.txt lyc_genoest_SIN.txt lyc_genoest_BST.txt lyc_genoest_CDY.txt lyc_genoest_CKV.txt
Trial 1: In the script above-- parents BST and SYC; admixed DBS; Script: runPopanc.sh
Trial 2 (running now): parents BLD and CDY; admixed DBS; Script: runPopanc.sh
Trial 3 (running now): parents old, Melissa; admixed DBS; Script: runPopanc_comb.sh
Trial 4 (running now): parents new, Melissa; admixed DBS; Script: runPopanc_comb.sh
ZG suggested:
Trial 5: parents GNP,BNP,KHL, Melissa, admixed DBS; Script runPopanc_comb.sh
Trial 6: parents BLD, FRC, Melissa, admixed DBS; Script runPopanc_comb.sh
cd /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles/
/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ../outfiles/outPopanc_bld-frc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./comb_bld-frc.txt ./comb_melissa.txt ./lyc_genoest_DBS.txt
/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ../outfiles/outPopanc_gnp-bnp-frc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./comb_gnp-bnp-khl.txt ./comb_melissa.txt ./lyc_genoest_DBS.txt
for $filename in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$filename.txt" -p popQ -s 0 -I $filename; done
for f in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$f.txt" -p popQ -s 0 $f; done
/uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o *.txt -p popQ -s 0 *.hdf5
for file in *.hdf5.txt; do mv "$file" "${file/.hdf5.txt/.txt}"; done
for f in *.hdf5.txt; do mv -- "$f" "${f%.hdf5.txt}.txt"; done
sed -i "1s/^/$(head -n1 ../../lyc_genoest_BCR.txt)\n/" *lyc_genoest_BCR.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_GNP.txt)\n/" *lyc_genoest_GNP.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_PSP.txt)\n/" *lyc_genoest_PSP.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_BIC.txt)\n/" *lyc_genoest_BIC.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_RDL.txt)\n/" *lyc_genoest_RDL.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_BTB.txt)\n/" *lyc_genoest_BTB.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_RNV.txt)\n/" *lyc_genoest_RNV.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_BNP.txt)\n/" *lyc_genoest_BNP.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_DBS.txt)\n/" *lyc_genoest_DBS.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_FRC.txt)\n/" *lyc_genoest_FRC.txt
sed -i "1s/^/$(head -n1 ../../lyc_genoest_PIN.txt)\n/" *lyc_genoest_PIN.txt
perl -p -i -e 's/39193/4434/g' aims1-lyc_genoest_*
perl -p -i -e 's/39193/2126/g' aims2-lyc_genoest_*
perl -p -i -e 's/39193/1164/g' aims3-lyc_genoest_*
#corelations in means for different windows for bld intial run
cor.test(bld3$mean, bld4$mean) #0.9889983
cor.test(bld3$mean, bld5$mean) #0.9998478
cor.test(bld3$mean, bld6$mean) #0.9904566
cor.test(bld5$mean, bld6$mean) #0.9904197
cor.test(bld4$mean, bld6$mean) #0.9938575
cor.test(bld4$mean, bld5$mean) #0.9889532
I also plotted all of them and they completely overlap. So I am chosing the window 3 for the remaining admixed pops.
POPANC
for f in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$f.txt" -p popQ -s 0 $f; done
for f in *.hdf5.txt; do mv -- "$f" "${f%.hdf5.txt}.txt"; done