Post date: Oct 24, 2019 1:57:48 AM
The 40 bam files (one per individual, sorted_merge*bam) are in /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/.
I am using GATK (v3.5-0-g36282e4) for variant calling by first running the HaplotypeCaller and then combining and genotyping the g.vcf files.
0. I first had to create the dictionary file for the new reference/consensus:
java -jar /uufs/chpc.utah.edu/sys/installdir/picard/2.1.1/picard.jar CreateSequenceDictionary R=tcuri_refgenome.fasta O=tcuri_refgenome.fasta.dict
From /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi, I ran
sbatch RunGatk.sh
which runs GatkFork.pl
#!/usr/bin/perl
#
# make g.vcf files
#
use Parallel::ForkManager;
my $max = 30;
my $pm = Parallel::ForkManager->new($max);
my $idir = '/uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi';
my $odir = '/uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/variants_curi';
my $tdir = '/scratch/general/lustre/tcuriGatk';
my $genome = "/uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/refs/tcuri_refgenome.fasta";
FILES:
foreach $bam (@ARGV){
$pm->start and next FILES; ## fork
$out = $bam;
$out =~ s/bam/g.vcf/ or die "failed here: $out\n";
system "java -Xmx48g -jar ~/bin/GenomeAnalysisTK.jar -T HaplotypeCaller -R $genome -I $idir/$bam -o $tdir/$out -gt_mode DISCOVERY -hets 0.001 -mbq 20 -out_mode EMIT_VARIANTS_ONLY -ploidy 2 -stand_call_conf 50 -pcrModel AGGRESSIVE --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000\n";
system "cp $tdir/$out $odir/$out\n";
$pm->finish;
}
$pm->wait_all_children;
NOTE: I had to back up and redo the alignments because of * characters denoting deletions in the consensus sequence. I just dropped these variants and reran everything as done previously.