Post date: Nov 21, 2019 6:0:0 PM
We are running MuTect2 to detect somatic mutations with and without a panel of normals. This post described the run without the panel of normals. See this page for more on MuTect2.
All relevant files are in /uufs/chpc.utah.edu/common/home/u6000989/data/aspen/gbs_pando_plus/MuTect2/.
pando_bams.txt lists the 109 samples we confidently assigned to pando.
ref_normals.txt lists 6 trees we are using as the normal reference. This would ideally be the ancestor of pando, but we don't have that. Instead, the first 3 are pando trees with solid coverage and spread across the clone, and the last 3 are nearby non-pando trees. We can see if these give consistent results and can combine the sets in some way (details to be determined).
MuTect2 operates on one sample at a time. We thus have 109 * 6 = 654 jobs. They were run as follows:
sbatch RunMuTectGatk.sh
which runs MuTectGatkFork.pl
which looks like this:
#!/usr/bin/perl
#
# make vcf files for PoNs
#
use Parallel::ForkManager;
my $max = 30;
my $pm = Parallel::ForkManager->new($max);
my $idir = '/uufs/chpc.utah.edu/common/home/u6000989/data/aspen/gbs_pando_plus/Alignments_mem';
my $odir = '/uufs/chpc.utah.edu/common/home/u6000989/data/aspen/gbs_pando_plus/MuTect2';
my $tdir = '/scratch/general/lustre/pandoMuTect2';
my $genome = "/uufs/chpc.utah.edu/common/home/u6000989/data/aspen/genome/Potrs01-genome.fa";
my $refs = shift(@ARGV);
open(REF, $refs) or die "couldn't open $refs\n";
while(<REF>){
chomp;
push(@refs,$_);
}
my $in = shift(@ARGV);
open(IN, $in) or die "couldn't open $in\n";
while(<IN>){
foreach $ref (@refs){
$pm->start and next; ## fork
chomp;
$bam = $_;
$out = $bam;
$out =~ s/bam/vcf.gz/ or die "failed here: $out\n";
$out = "mu2_$out";
system "java -Xmx48g -jar ~/bin/GenomeAnalysisTK.jar -T MuTect2 -R $genome -I:tumor $idir/$bam -I:normal $idir/$ref -o $tdir/$out\n";
system "cp $tdir/$out $odir/$out\n";
$pm->finish;
}
}
$pm->wait_all_children;