054. Why is HaplotypeCaller slower in the most recent GATK4 beta versions

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by Geraldine_VdAuwera

on 2017-12-06

Because it’s saving its strength for the 4.0 general release ;)

Many of the “early adopters” who have been testing out the GATK4 during its beta phase have pointed out that they saw significant speed improvements in early beta versions (yay!), but then when they upgraded to more recent betas (starting with 4.beta.4), they observed a return to the slowness seen in GATK3 versions (boo!). This has understandably caused some concern to those who were attracted to the GATK4 beta version of HaplotypeCaller because of its promised speed improvements — so, basically everyone.

The good news is that this is only a temporary artifact of some of our development and evaluation constraints, which forced us to remove some key improvements while we refine and evaluate the equivalence of results with the older version. We should be able to restore the HaplotypeCaller’s speed improvements in the very near future — in time for the GATK 4.0 planned for January 9, 2018.

If you’re interested in understanding why we had to hobble the HaplotypeCaller in this way, please read on! Otherwise feel free to take our word for it.

——

There are two opposing forces in play when we migrate tools from the older GATK to the new 4.x framework. One is that we want to streamline the program’s operation to make it run faster and cheaper. The other is that we have been asked by our internal stakeholders to produce an exact “tie-out” for the germline variant discovery pipeline that we run in production at the Broad (i.e. for a subset of tools including HaplotypeCaller). This means that the HaplotypeCaller we release in GATK 4.0 needs to produce exactly the same output (modulo some margins) as the one from version 3.8, to minimize disruption when the pipelines are migrated. That’s a very high standard, and it’s the right thing to do both from an operations standpoint and from a software engineering standpoint.

However, these two directives came into conflict because we realized, somewhere in the early beta stages, that some of the optimizations that were introduced to make HaplotypeCaller faster also created output differences that were outside of the acceptable margins. We believe that those differences may actually be improvements on the “old” results, but for the sake of the tie-outs we had to take them out temporarily — hence the HaplotypeCaller went back to being slower than we’d like in the later beta releases.

We’re confident we have a solution that will allow us to put the efficiency optimizations back in as soon as the final tie-out test results have been approved, which appears to be imminent. So by the time GATK4 is released into general availability in January, the new HaplotypeCaller should have all its superpowers back.

Updated on 2017-12-06

From EADG on 2017-12-06

Hi @Geraldine_VdAuwera ,

thx for the explanation, can you make the marginis public ? It would be nice if I can say to the QM-folks:“Look at this GATK is doing internal validation, we can trust it without many revalidation steps.”

Would save me a lot of time ;)

Greets,

EADG

From Geraldine_VdAuwera on 2017-12-07

Hah no kidding — I’ll ask if we can publish the validation criteria, sure. No guarantees though; not because we don’t want to share (we do) but it means a bit more work for someone over here, so I have to sweet-talk them into doing it for the beauty of science ;)

From matdmset on 2017-12-07

Hi @Geraldine_VdAuwera

when you say “We believe that those differences may actually be improvements on the “old” results, but for the sake of the tie-outs we had to take them out temporarily”, does this mean you believe the call made by GATK4 were better/more accurate than the calls made by GATK3.8? If so, doesn’t taking them out mean a step backwards, just for the sake of easy transitioning?

I’m certain there’s a whole reasoning behind it, but I’d like to make sure you guys don’t hold back on your awesome work because of bureaucracy. Because “go-fast” superpowers + “better-calls” superpowers = megasuperpowers

Cheers

From Geraldine_VdAuwera on 2017-12-07

Hah yes that was my initial reaction too :)

Re-reading my original piece I realize that I handwaved the acceptance testing process, which is a bit more complicated than I described. The first part, which is what we have been focused on, is where the stakeholders (including the operations team) tell the development team “convince us that this new version still works correctly before you propose any additional improvements”. That’s where the tie-out requirement comes in.

Whenever you’re changing two or more things at the same time, if you get a different result, you can’t tell with certainty what caused the difference, which is at the very least uncomfortable. And that’s what happened here.

When the engine team ported the HaplotypeCaller to the new GATK4 framework, they were really excited to get rid of some of the more convoluted code that was living in its basement. But when they found that the results on some calls were significantly different, they couldn’t say whether the differences were due to changes they had made in the code when they ripped out the old plumbing, or whether it was the GATK4 framework itself that was responsible — as far as they knew it could be some subtle bug somewhere doing the wrong thing to the data. Just because many of the changes seemed to be going in the right direction, didn’t mean that it was necessarily safe to just roll with them. So they plugged the old code back in until they could show that the tool could produce the same results in the new framework. Along the way they did find a number of bugs that had crept in, by the way, which is pretty much inevitable with a codebase of this complexity. That to me really demonstrates the value of this very conservative approach; at the end of the process the resulting software comes out much better for it.

Now they’re working back toward the more streamlined, faster version; and as they put it back together again, if there are improvements (which I can’t commit to saying definitively until all is said and done), they’ll be able to explain exactly where those come from and justify why they should be retained. Then the engine team can hand the new version over to the Ops team and tell them exactly what to expect in their tests. If that clears, the pipelines can be upgraded with less back and forth than if they had just walked up with the original new version and said “we think this looks better”. So it’s not that we would reject improvements in the name of consistency; rather it’s that we need to start from a consistent baseline that we trust in order to evaluate proposed improvements.

From matdmset on 2017-12-07

Thanks for the reply! As I said, I was sure there was a whole reasoning behind it, but it’s good to know there’s a tremendous amount of QC being done :smiley: :+1:

Report abuse