040. Mutect2 resources guide

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by shlee

on 2018-01-31

A new tutorial for somatic calling

We have a new tutorial, Tutorial#11136, that outlines how to call somatic short variants, i.e. SNVs and indels, with GATK4 Mutect2. The tutorial provides small example data to follow along with.

Mutect2-compatible germline resources

Full-length Mutect2-compatible human germline resources are available on our FTP server and at gs://gatk-best-practices/. The resources are simplified from the gnomAD resource and retain population allele frequencies. Mutect2 and GetPileupSummaries are the two tools in the workflow that each require a germline resource.

Working WDL scripts

If you want to run the Somatic Short Variant Discovery Best Practices workflow using WDL, be sure to check out the official Mutect2 WDL script in the gatk-workflows repository. @bshifaw and other engineers optimize the scripts in the repository to run efficiently in the cloud. Furthermore, the scripts come with example JSON format inputs files filled out with publically-accessible cloud data.

For other Mutect2-related scripts, e.g. towards panel of normals generation, check out the gatk repository's scripts/mutect2_wdl directory. Our developers update these scripts on a continual basis.

For background information

If you are new to somatic calling, be sure to read Article#11127. It gives an overview of what traditional somatic calling entails. For one, somatic calling is NOT just a difference between two callsets in that germline variant sites are excluded from consideration.

For those switching from GATK3 MuTect2, Blog#10911 will bring you up to speed on the differences.

An off-label tutorial for simple difference calling

If you are interested in simply calling differences between two samples, Blog#11315 outlines an off-label two-pass Mutect2 workflow. Off-label means the workflow is not a part of the Best Practices and is therefore unsupported. However, if given enough community interest, we may be convinced to further flesh out the workflow. Please do post to the forum to express interest.

Updated on 2018-02-02

From alanhoyle on 2019-07-25

Have you all considered distributing a version of the germline resources that are limited to the subset of hg38 contigs that are included in the GDC harmonization GRCh38 genome?

https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

Notably, the GDC reference does not include any of the *_alt contigs

From Geraldine_VdAuwera on 2019-08-01

Hi Alan, we’ve gone down the road of distributing alternate versions of the same reference in the past and it caused enough pain and confusion downstream that there would have to be a really compelling reason to do so again.