initial runs with popanc

I ran the first trial run with popanc. This was just to test if popanc runs properly and see the time it takes to run the program. The program ran pretty quickly. In this run, I used SYC as the ids parent (infile 0), BST as the Melissa parent (infile 1) and used DBS as the admixed population. The infiles 0 and 1 are the parental populations that we give for the admixed population. Eventually I will have to switch this and make it more real in the sense that I might end up combining idas and Melissa populations based on the PCA and even divide idas into new and old hybrids.

Here is the shell script I used to run popanc. The files are saved in the folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles/

To run this program I will use the lyc_genoest_*.txt files which genotype estimate averages across Ks (and across chains)for each population. Notice the arguments given in the shell script for further information on mcmc chains, burnin etc.

#!/bin/sh -f

#SBATCH --time=36:00:00

#SBATCH --nodes=1

#SBATCH --ntasks=8

#SBATCH --account=gompert

#SBATCH --partition=kingspeak

#SBATCH --job-name=popanc

#SBATCH --mail-type=FAIL

#SBATCH --mail-user=samridhi.chaturvedi@gmail.com

echo ------------------------------------------------------

echo -n 'Job is running on node '; cat $SLURM_JOB_NODELIST

echo ------------------------------------------------------

echo SLURM: job identifier is $SLURM_JOBID

echo SLURM: job name is $SLURM_JOB_NAME

echo ------------------------------------------------------

module load gcc

module load gsl

module load hdf5

cd /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/pre_popanc/

/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ./outPopanc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./lyc_genoest_BST.txt ./lyc_genoest_SYC.txt ./lyc_genoest_DBS.txt

early()

{

echo ' '

echo ' ############ WARNING: EARLY TERMINATION ############# '

echo ' '

}

exit

Convert hdf5 file to txt file with posterior estimates:

/uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o outPopanc_trial1.txt -p popQ -s 0 -I outPopanc.hdf5

Here are some more initial runs which I am running right now:

I created combined population files using the combinePops.pl script(/uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles)

Outfiles(hdf5): /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/outfiles

#Note the populations below which have been used to create the files:

perl combinePops.pl old.txt lyc_genoest_SDC.txt lyc_genoest_SYC.txt lyc_genoest_KHL.txt

perl combinePops.pl new.txt lyc_genoest_RDL.txt lyc_genoest_BLD.txt lyc_genoest_RNV.txt

perl combinePops.pl melissa.txt lyc_genoest_SIN.txt lyc_genoest_BST.txt lyc_genoest_CDY.txt lyc_genoest_CKV.txt

Trial 1: In the script above-- parents BST and SYC; admixed DBS; Script: runPopanc.sh

Trial 2 (running now): parents BLD and CDY; admixed DBS; Script: runPopanc.sh

Trial 3 (running now): parents old, Melissa; admixed DBS; Script: runPopanc_comb.sh

Trial 4 (running now): parents new, Melissa; admixed DBS; Script: runPopanc_comb.sh

ZG suggested:

Trial 5: parents GNP,BNP,KHL, Melissa, admixed DBS; Script runPopanc_comb.sh

Trial 6: parents BLD, FRC, Melissa, admixed DBS; Script runPopanc_comb.sh

cd /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/complete/popanc/infiles/

/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ../outfiles/outPopanc_bld-frc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./comb_bld-frc.txt ./comb_melissa.txt ./lyc_genoest_DBS.txt

/uufs/chpc.utah.edu/common/home/u6000989/bin/popanc -o ../outfiles/outPopanc_gnp-bnp-frc.hdf5 -n 5 -d 15 -m 10000 -b 5000 -t 10 -s 1 -q 1 -z 1 ./comb_gnp-bnp-khl.txt ./comb_melissa.txt ./lyc_genoest_DBS.txt

for $filename in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$filename.txt" -p popQ -s 0 -I $filename; done

for f in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$f.txt" -p popQ -s 0 $f; done

/uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o *.txt -p popQ -s 0 *.hdf5

for file in *.hdf5.txt; do mv "$file" "${file/.hdf5.txt/.txt}"; done

for f in *.hdf5.txt; do mv -- "$f" "${f%.hdf5.txt}.txt"; done

sed -i "1s/^/$(head -n1 ../../lyc_genoest_BCR.txt)\n/" *lyc_genoest_BCR.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_GNP.txt)\n/" *lyc_genoest_GNP.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_PSP.txt)\n/" *lyc_genoest_PSP.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_BIC.txt)\n/" *lyc_genoest_BIC.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_RDL.txt)\n/" *lyc_genoest_RDL.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_BTB.txt)\n/" *lyc_genoest_BTB.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_RNV.txt)\n/" *lyc_genoest_RNV.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_BNP.txt)\n/" *lyc_genoest_BNP.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_DBS.txt)\n/" *lyc_genoest_DBS.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_FRC.txt)\n/" *lyc_genoest_FRC.txt

sed -i "1s/^/$(head -n1 ../../lyc_genoest_PIN.txt)\n/" *lyc_genoest_PIN.txt

perl -p -i -e 's/39193/4434/g' aims1-lyc_genoest_*

perl -p -i -e 's/39193/2126/g' aims2-lyc_genoest_*

perl -p -i -e 's/39193/1164/g' aims3-lyc_genoest_*

#corelations in means for different windows for bld intial run

cor.test(bld3$mean, bld4$mean) #0.9889983

cor.test(bld3$mean, bld5$mean) #0.9998478

cor.test(bld3$mean, bld6$mean) #0.9904566

cor.test(bld5$mean, bld6$mean) #0.9904197

cor.test(bld4$mean, bld6$mean) #0.9938575

cor.test(bld4$mean, bld5$mean) #0.9889532

I also plotted all of them and they completely overlap. So I am chosing the window 3 for the remaining admixed pops.

POPANC

for f in *.hdf5; do /uufs/chpc.utah.edu/common/home/u6000989/bin/estpost_popanc -o "$f.txt" -p popQ -s 0 $f; done

for f in *.hdf5.txt; do mv -- "$f" "${f%.hdf5.txt}.txt"; done