created by GATK_Team
on 2017-12-28
This document describes the reference confidence model applied by HaplotypeCaller to generate a per-sample [GVCF](https://software.broadinstitute.org/gatk/documentation/article?id=11004), invoked by `-ERC GVCF` or `-ERC BP_RESOLUTION`.
As explained [here](https://software.broadinstitute.org/gatk/documentation/article?id=11068), HaplotypeCaller works by assembling the reads to create potential haplotypes, realigning the reads to their most likely haplotypes, and then projecting these reads back onto the reference sequence via their haplotypes to compute alignments of the reads to the reference. At that point, we can calculate the likelihoods of each possible genotype and emit variant calls.
What that article does not explain is how HaplotypeCaller additionally estimates the chance that some (unknown) non-reference allele is segregating at this position by examining the realigned reads that span the reference base. At this base we perform two calculations:
Based on this, we emit the genotype likelihoods (`PL`) and compute the `GQ` (from the `PL`s) for the least confidence of these two models. We use a symbolic ALT allele, ``, to hold the likelihood that the site is not homozygous reference, as well as allele-specific `AD` and `PL` field values.
We do this at all sites in the territory covered by the analysis, including homozygous-reference sites, both inside and outside the ActiveRegions determined by HaplotypeCaller.
From jianxinwang on 2018-01-31
I’m getting “A USER ERROR has occurred: Invalid argument ‘GVCF’.” error when running the following command:
$ gatk —java-options “-Xmx38000M” HaplotypeCaller -I my_input.bam -R Homo_sapiens_assembly38.fasta -L wgs_calling_regions_chr1.hg38.interval_list -O my_output.g.vcf -bamout -ERC GVCF —verbosity INFO —TMP_DIR /tmp
Can anybody help me on this? Thanks.
From SkyWarrior on 2018-02-01
You seem to forget to give a name to the -bamout file.
From jianxinwang on 2018-02-01
Thanks, indeed that is the cause of the problem. My bad!