Somatic Mutations Tools on CGC
Somatic Mutation Tools on Cancer Genomics Cloud (Seven Bridges Genomics):
To run individual mutation callers:
MuTect2:
Basic MuTect2 workflow, which does the GATK4 Mutect first, followed by FilterMutectCalls.
The parallelized MuTect2 workflow uses BED file to split the workflow into many threads region-by-region across many instances.
An example of parallelized MuTect2 job can be found here.
SomaticSniper:
Basic SomaticSniper app. Since SomaticSniper does not do partial BAM input, just use the app if you only need to do SomaticSniper run. It'll be run on a single thread, which takes ~ 6 1/2 hours to finish a 50X WGS data set. An example is this.
VarDict:
Basic VarDict workflow, which consists of 3 successive apps: 1) VarDictJava, 2) VarDict's testsomatic.R app, and then 3) VarDict's var2vcf.pl app.
The parallelized VarDict workflow uses BED file to split the workflow into many threads region-by-region across multiple instances. An example is this.
MuSE:
Basic MuSE workflow, which consists of MuSE's call and sump commands.
The parallelized MuSE workflow splits job based on a specified number of regions. This is an example.
Scalpel:
Basic Scalpel app, which includes both the discovery and export command contained in it.
The parallelized Scalpel workflow splits the job by splitting BED file. This is a mini-BAM example. Scalpel is computationally extremely expensive, and it has not been run on a full data set yet.
Strelka:
This is the basic Strelka app. The Strelka workflow requires a BED file, which is then converted to bed.gz file.
The parallelized Strelka workflow splits the job by splitting BED files. This is an example.
TNScope
This is rev25 used: https://cgc.sbgenomics.com/u/xiaowen/fda-seqc2-wg-1/apps/#xiaowen/fda-seqc2-wg-1/sentieon-tnscope-wf-rev21, which is rev25 copied from here: https://cgc.sbgenomics.com/u/xiaowen/fda-seqc2-wg-1-staging/apps/#xiaowen/fda-seqc2-wg-1-staging/sentieon-tnscope-wf
SomaticSeq.Wrapper:
The basic SomaticSeq app that takes input from all the VCF files from each tool. Depending on the optional input files, can go into consensus mode, training mode, or prediction mode. This is an example of a single-threaded job in simple consensus mode.
Parallelized SomaticSeq workflow for consensus mode. Here is an example. Memory usage needs to be optimized better.
Workflow of multiple mutation callers:
Running MuTect2, SomaticSniper, VarDict, MuSE, and Strelka
SOAP-generated BAM files do not seem compatible with VarDict. So DO NOT use this workflow on SOAP-generated BAM files.
This is an example. It outputs VCF files for each tools. It does not automatically run SomaticSeq afterward.
Running MuTect2, SomaticSniper, MuSE, and Strelka (NO VarDict).
This can be used for SOAP-generated BAM files.
An example.
Complete SomaticSeq Workflow:
This is the parallelized SomaticSeq workflow (consensus mode) that includes MuTect2, SomaticSniper, VarDict, MuSE, VarDict, and Strelka. It will run these MuTect2/VarDict/MuSE/Strelka in parallel by region, and at the same time running SomaticSniper on a single mode. Once they're all finished, each region will go through SomaticSeq (SomaticSniper's output will be split into these regions). Once done, they will be combined into Ensemble TSV files and Consensus VCF files.
It incorporates the region-wise SomaticSeq workflow that requires a SomaticSniper output VCF file, so this workflow isn't designed to be used as standalone. This workflow itself include all the workflow of the somatic tools plus Somaticseq.Wrapper.
This is an example run using mini-BAM files.
One issue: the way this workflow is constructed, it waits SomaticSniper to finish on a single thread before everything else starts in parallel.