created by shlee
on 2017-12-08
The first thing to note is how the tool names are different. In GATK3 it's spelled MuTect2 with an uppercase T, whereas in GATK it's spelled Mutect2 with a lowercase t. Not only is the new tool name easier to type, it helps us distinguish which version of the tool a document refers to.
And their respective workflow tools differ too. The table shows the tools for the workflow functionalities for GATK3 versus GATK4.
GATK3 MuTect2 will remain in beta status. GATK4 Mutect2 is in beta status as of the official GATK4 release.
One major difference is GATK4 breaks off filtering into a separate tool, FilterMutectCalls. In GATK3, MuTect2 both calls and filters variants. In GATK4, Mutect2 is focused mostly on calling and does some minimal upfront filtering of obvious non-somatic sites. However, it leaves the majority of filtering to FilterMutectCalls. This separation makes it easier to test changes to filtering thresholds as the computationally expensive calling is decoupled from filtering.
Another major difference is in site versus allele filtering against the germline resource. GATK3 MuTect2 prefilters sites in the germline resource regardless of the allele in the tumor. GATK4 Mutect2 distinguishes alleles in the germline resource and only filters the site if the tumor allele matches. If the alleles are different, then the tool considers the allele a putative somatic mutation.
Filtering of sites in the panel of normals (PoN) and the matched normal remains unchanged, except that the tool will prefilter most of these such that site records are absent from the VCF.
With the 1000 Genomes Project now wrapped up, and with the availability of germline variant callsets from even larger cohorts, i.e. gnomAD, the germline component of human cancers is something that GATK4 Mutect2 can account for in a more sophisticated way. GATK4 Mutect2 factors the germline population allele frequencies towards somatic probability calculations. For a given allele in the tumor, if it is present in the germline resource, its probability of being a somatic mutation is weighted inversely to the frequency with which the allele is observed in the population.
Here are the differences between GATK3 MuTect2 and GATK4 Mutect2 as a list.
--genotype-pon-sites
), which can be useful in comparing results to older MuTect versions.What remains unchanged is that neither version calls potential loss of heterozygosity (LoH) events. To detect LoH, see the Somatic Copy Number Variant (CNV) workflow.
You can find tutorials that explore consideration in the GATK3 workflow or the GATK4 workflow on our forum.
–-vcf
). For example usage commands see this thread. For prior versions that give results in MAF format, see the Broad CGA website. For workflows that use a composite of MuTect1 SNV calls and MuTect2 indel calls, see FireCloud Article#7512.Updated on 2018-01-17
From picard_gatk_mj on 2018-11-24
I am still can not understand the plot said the the gatk4 mutect2 difference PON,normal, germline resource about the site (C, C) to (-, G).
you said “ only filters the site if the tumor allele matches”, but does G, C matches, (matches means the same base or complementary base), thanks a lot.
From ying_sheng_1 on 2019-01-08
This is explained in the following page if you are still interested:
https://software.broadinstitute.org/gatk/documentation/article?id=11127
Under “A variant allele in the case sample is not called if the site is variant in controls.”.