created by bhanuGandham
on 2019-01-30
I’m delighted to introduce the first major version update to GATK4, version 4.1.0.0! This release includes several exciting new analysis pipelines and tons of improvements to existing tools, many of which are now officially out of beta (YAY!).
You can check out the full release notes on [Github](https://github.com/broadinstitute/gatk/releases/tag/4.1.0.0 “Github”) to get a sense of the scale of this release, but fair warning, it’s a lot. In fact, we felt there was far too much in this release to even give a satisfying overview in a single blog post, so we decided to develop a series of nine blog posts that each cover one of the main functional areas of improvement. The table below lists the nine posts along with a short summary for each. Each blog post was written by the lead developer(s) on that project; it outlines the history of the challenge at hand, the approach that they developed to solve it, and future development prospects.
We plan to publish two posts per week starting tomorrow, so keep an eye out for them, [subscribe to forum notifications](https://software.broadinstitute.org/gatk/documentation/article?id=11026) or follow @gatk_dev on Twitter! We’ll add links to the table as the posts become available.
And now without further ado I present to you [GATK4.1!!!](https://github.com/broadinstitute/gatk/releases “GATK4.1!!!”)
——
Two Sisters!
Mutect2 and HaplotypeCaller both aim to achieve sensitive SNP and indel discovery, though in very different contexts. Despite their different applications, they’re more closely related than first meets the eye. GATK 4.1 features several performance and accuracy improvements, spurred by Mutect2 development and simultaneously benefiting both tools. We’re also debuting a new beta version of GVCF mode for Mutect2, bringing the HaplotypeCaller’s reference confidence model to somatic analysis.
Be big, feel small!
The Broad generates 20 terabytes of data every day, so it is no surprise that we focus much of our efforts in the germline space on processing more data more efficiently. While efficiency improvements in GATK 4.1 satisfy users with the largest cohorts (think All of Us), rest assured we aren’t discounting smaller cohorts! See how GATK 4.1 facilitates generating larger, cheaper germline cohort callsets and improves accuracy and usability for single-sample clinical cases.
Expanding the use cases for a proven tool
Enhanced sensitivity and precision allows GATK4.1’s Mutect2 to encompass previously challenging domains, including mitochondria, cfDNA, and multiple tumor samples. We’ve improved performance and accuracy in single-sample calling, and have ambitious plans for more progress.
Overcoming barriers to understanding the mitochondrial genome
Calling SNPs and INDELs on the Mitochondrial genome poses unique challenges, due to its circular shape and very high copy number. We now have a tested and validated “Best Practices” pipeline using Mutect2 to call short variants at arbitrary allele fractions in the mitochondrial genome.
Adapting a proven tool to liquid
biopsy studies
Coming soon, a pipeline using MuTect2 for low allele fraction variant detection from duplex-sequenced liquid biopsies. Liquid biopsies present novel challenges — requiring high sensitivity at low allele fraction. With a few minor adjustments to parameters passed to MuTect2 and the addition of a new filter, our pipeline achieves > 90% sensitivity at ~1% allele fraction with less than 1FP / MB on three separate panels with territory as large as 2MB.
Delivering results faster
We continue to improve our support for users who want to run on Apache Spark with GATK 4.1. This release includes major improvements to MarkDuplicatesSpark, in particular, as well as the full ReadsPipelineSpark, powered by a brand new Spark I/O library, Disq!
A production-ready tool to call copy-number variants
In the current stage of evolution, we can still see traits inherited from venerable ancestors in the ModelSegments and GermlineCNVCaller pipelines. However, the GATK 4.1 pipelines also feature new adaptations that dramatically improve performance and enable scalability from exomes to genomes. The GATK 4.1 release brings these pipelines out of beta – adding CNV calling officially to GATK’s growing set of capabilities.
Updated on 2019-03-29
From SkyWarrior on 2019-01-30
Awesome release roundup. However we are still waiting for the much desired changes to HaplotypeCaller (AKA missed calls due to -L parameter.)
From hugolam on 2019-01-30
Great and thanks. After the update to 4.1, I saw the following error with the “—resource” parameter in VariantRecalibrator:
A USER ERROR has occurred: Couldn’t read file file:///proj/hg19/omni,known=false,training=true,truth=false,prior=12.0:/proj/hg19/omni.vcf. Error was: It doesn’t exist.
The same command works in the previous version, 4.0.12.0. It seems like now its adding the current directory to the parameter —resource and making the whole thing a “file” object? or the API has changed? thanks!
From cnorman on 2019-02-04
@hugolam The command line syntax for “tagged” arguments such as `—resource` changed for 4.1. Instead of specifying the tags as part of the argument value, specify them as part of the argument name:
`—resource:known=false,training=true,truth=false,prior=12.0 /proj/hg19/omni.vcf`
From yingchen69 on 2019-02-07
Hi, where is the doc for gatk4 mitochondria pipeline? The github page (https://github.com/gatk-workflows/gatk4-mitochondria-pipeline) is blank. Best, Ying
From leshwill on 2019-02-08
how do I make GenomicsDB workspaces by chromosome? Does the example -L 20 in the documentation mean chromosome 20? Thank you for your support.
From gauthier on 2019-02-21
@SkyWarrior David B. has a lead on the -L issue: https://github.com/broadinstitute/gatk/issues/3697 I think he has a prototype, but he’s still working through some additional Mutect2 false positives along with everything else on his plate. Hopefully there’ll be a PR in a few weeks.
From SkyWarrior on 2019-02-22
Thanks @gauthier. We know that you guys are super busy to make things even better. :)
From manolis on 2019-07-25
Hi, are you planning a new release? (v4.1.3.0)
If yes, when?
Thanks
From bhanuGandham on 2019-07-29
@manolis
GATK version 4.1.3.0 is coming out soon in the next week or two.
A production-ready tool to predict variant function
We created Funcotator to be a fast and accurate functional annotation tool. The latest release of GATK includes updates to Funcotator that make it even more robust and correct, as well as flexible and prod-ready. The addition of two sets of data sources to go with Funcotator (including Gencode, ClinVar, gnomAD, and more) enable it to be used out-of-the-box to add annotations to either germline or somatic variants.
A production-ready suite of tools for single-sample variant filtration
We present the CNNVariant suite of tools, a compliment to VQSR for single-sample variant filtration. This toolset includes a pre-trained model — ready to score variants — as well as the capability to train new models for new types of data. We gathered a massive amount of data together to train our model, and validated its performance against different biological samples, sequencing machines, and protocols.