Post date: May 03, 2016 8:32:53 PM
I used vcs2gl.pl to convert the filtered vcf file to genotype likelihood format. I did this three times, once for all variants, once for common variants, and once for all except very rare variants:
perl vcf2gl.pl 0.0 morefilter_filtered2x_varsLycHybAncestry.vcf
39,318 SNPs
perl vcf2gl.pl 0.05 morefilter_filtered2x_varsLycHybAncestry.vcf
4448 SNPS
perl vcf2gl.pl 0.005 morefilter_filtered2x_varsLycHybAncestry.vcf
24,850 SNPs
The resulting gl files are lychybanc.gl, lychybanc_common.gl, and lychybanc_notvryrare.gl respectively.
I want to try entropy first with the common variants. This is to make sure things are working and to get a better feel for how long the runs will take. Also, I will probably (potentially) show overall patterns of structure based on common variants.
To do this I first generated starting values from point estimates of genotypes (files are here:/uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/entropy/startingvals):
perl gl2genest.pl lychybanc_common.gl
R CMD BATCH initq.R
Next, I ran entropy. I am using an option to run multiple jobs on one node. I will want to play with this more generally. To run,
batch runentropy.sh
This runs one chain per each k. The file contains this:
#!/bin/bash
#SBATCH -n 7
#SBATCH -N 1
#SBATCH -t 72:00:00
#SBATCH -p kingspeak
#SBATCH -A gompert
#SBATCH -J entropy
module load gsl
module load hdf5
srun --multi-prog my.conf
It calls my.conf, which contains this (%t is basically a thread integer):
0-6 /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/entropy/entropy.sh %t
It calls entropy.sh, which has the actual job information:
#!/bin/bash
module load gsl
module load hdf5
K=$(($1+2))
/uufs/chpc.utah.edu/common/home/u6000989/bin/entropy -i /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/variants/lychybanc_common.gl -l 15000 -b 5000 -t 5 -k $K -Q 0 -s 50 -q /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/entropy/startingvals/ldak$K.txt -o /scratch/general/lustre/ento_lhaCommonCh0K$K.hdf5 -w 0 -m 1
mv /scratch/general/lustre/ento_lhaCommonCh0K$K.hdf5 /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/entropy
Note that this is 15,000 iterations, 5000 dropped as a burnin and a thinning interval of 5. I am running k = {2...8} and plan to run three chains each.
The above took <12 hours. I am now running the full data set (lychybanc.gl, 9 day walltime) and the data set with only rare variants removed (lychybanc_notvryrare.gl, 5 day walltime) using the same conditions.