created by Geraldine_VdAuwera
on 2013-08-23
Use HaplotypeCaller!
The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper. Its ability to call SNPs is equivalent to that of the UnifiedGenotyper, its ability to call indels is far superior, and it is now capable of calling non-diploid samples. It also comprises several unique functionalities such as the reference confidence model (which enables efficient and incremental variant discovery on ridiculously large cohorts) and special settings for RNAseq data.
As of GATK version 3.3, we recommend using HaplotypeCaller in all cases, with no exceptions.
Caveats for older versions
If you are limited to older versions for project continuity, you may opt to use UnifiedGenotyper in the following cases:
- If you are working with non-diploid organisms (UG can handle different levels of ploidy while older versions of HC cannot)
- If you are working with pooled samples (also due to the HC’s limitation regarding ploidy)
- If you want to analyze more than 100 samples at a time (for performance reasons) (versions 2.x)
Updated on 2014-10-24
From Geraldine_VdAuwera on 2014-10-24
This document has been updated to reflect status as of GATK version 3.3. Older comments and questions have been moved to this archival thread: http://gatkforums.broadinstitute.org/discussion/4744/questions-about-using-ug-vs-hc-out-of-date
From yd44@duke.edu on 2016-02-01
Hi, since GATK has version 3.5 now, which one would you suggest in this version? Thanks!
From Geraldine_VdAuwera on 2016-02-01
Definitely HaplotypeCaller; this will apply to all future versions unless otherwise stated.
From thedam on 2016-02-23
Hi,
If I use HaplotypeCaller, is it necessary to make steps: RealignerTargetCreator, IndelRealigner, BaseRecalibrator?
Somewhere I’ve read that now HaplotypeCaller can be applied right after MarkDuplicates. Is it true?
Ps. What should be applied: MarkDuplicatesWithMateCigar or MarkDuplicates?
thx
From Sheila on 2016-02-25
@thedam
Hi,
I’m not sure where you read that you can skip all those steps! We still recommend those, as they are indeed important. Have a look at the [Best Practices](https://www.broadinstitute.org/gatk/guide/best-practices.php) for more information.
[This article](https://www.broadinstitute.org/gatk/guide/article?id=6747) should help with marking duplicates.
-Sheila
From thedam on 2016-02-27
Hi, thanks for the answewr!
Well, I’ve read it here:
https://www.broadinstitute.org/gatk/events/slides/1506/GATKwr8-B-2-Indel_realignment.pdf
but maybe it didn’t understand it correctly. Slide 25:
“Is realignment still necessary with latest software?
Latest tools being implemented for variant discovery
(HaplotypeCaller, MuTect 2, Platypus) all include some
sort of assembly step (for which upstream realignment is
not really helpful). “
As I understand, HaplotypeCaller does it’s own ‘realignment’ (by assembling reads that are around the region) so IndelRealigner is not needed.
From Geraldine_VdAuwera on 2016-02-27
@thedam you’re right that indel realignment is not as important anymore, but we still recommend running it (and we do so in our production pipeline) because we think it may still improve results of BaseRecalibrator by removing mismatches that would otherwise constitute noise.
From MehulS on 2018-12-17
I noticed that Unified Genotyper (amongst others, excluding Haplotype Caller) was run on the latest 1000genomes phase 3 data (after lifting over to GRCh38DH) to recall variants (https://f1000research.com/posters/7-1445). There were some other papers which recommended using Unified Genotyper on low-coverage data. I have about 100-109 samples. Which tool is more appropriate ?