Update on Aug 22 2018: I have created two folders in the entropy/common folder: allsex and males. The allsex folder has the files for analysis run on common variants for all individuals. The males folder has files for analysis run just with males on the common variants. The scripts below need to be modified for the directories and filenames.
I am running entropy for first trial run. This is to make sure things are working and to get a better feel for how long the runs will take. Also, I will probably (potentially) show overall patterns of structure based on common variants.
To do this I first generated starting values from point estimates of genotypes (files are here:/uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_dubois/entropy/complete/):
I created files for each population and saved respective output files in the population folders.
1. Run gl2genest.pl for each population to get input file for entropy.
perl gl2genest.pl subset_filtered_variantsLycaeides.gl
2. I transferred the output files to the startingvals folder and then in this folder I ran:
R CMD BATCH initq.R
This created ldak files for each k {2-8}.
Next, I ran entropy. I am using an option to run multiple jobs using the perl parallel forkmanager.
To run,
sbatch runentropyGompKP.sh
This runs the perl fork script forkRunEntropy.pl. The file contains this:
Usage: perl forkRunEntropy.pl 10
This file contains the following code to run entropy using fork manager:
#!/usr/bin/perl
#
# creates a child process per chain
#usage perl forkFilter.pl #forks
use warnings;
use strict;
use Parallel::ForkManager;
my $max = shift(@ARGV); #get number of cores to use at one time
my $pm = Parallel::ForkManager->new($max);
my $in = "/uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/subset_filtered_variantsLycaeides.gl";
my $qdir = "/uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/startingvals/subset/";
foreach my $ch (1..3){
CHAINS:
foreach my $k (2..5){
$pm->start and next CHAINS; ##fork;
print "/uufs/chpc.utah.edu/common/home/u6000989/bin/entropy -i $in -l 8000 -b 5000 -t 3 -k $k -Q 0 -s 50 -q $qdir"."ldak$k.txt -o /scratch/general/lustre/entropy/ento_lyc_hybridsCh$ch"."K$k.hdf5 -w 0 -m 1\nmv /scratch/general/lustre/entropy/ento_lyc_hybridsCh$ch"."K$k.hdf5 /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_dubois/entropy/mcmc/";
#system"sleep 2\n";
system "/uufs/chpc.utah.edu/common/home/u6000989/bin/entropy -i $in -l 8000 -b 5000 -t 3 -k $k -Q 0 -s 50 -q $qdir"."ldak$k.txt -o /scratch/general/lustre/entropy/ento_lyc_hybridsCh$ch"."K$k.hdf5 -w 0 -m 1\nmv /scratch/general/lustre/entropy/ento_lyc_hybridsCh$ch"."K$k.hdf5 /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_dubois/entropy/mcmc/";
$pm->finish; ## exit the child process
}
}
$pm->wait_all_children;
Note that this is 15,000 iterations, 5000 dropped as a burnin and a thinning interval of 5. I am running k = {2...6} and plan to run three chains each.The outputs (hdf5 files) will be saved in the mcmc folder.
RUNNING ENTROPY FOR COMMON VARIANTS K =2,3
I am now running entropy on common variants (N=6652).
Folder: /uufs/chpc.utah.edu/common/home/u6007910/projects/lyc_dubois/entropy/common
Same as above with folder mcmc (results) and startingvals (ldak files).