Best practices for evaluating single nucleotide variant calling methods for microbial genomics
2013 Tool SMASH to compare variant callers
Hap.py - A set of programs based on htslib to benchmark variant calls against gold standard truth datasets.
RTG Tools contains utilities to easily manipulate and accurately compare multiple VCF files
GA4GH benchmarking tools (vcfeval, hap.py)
vcfcomparator (GIAB) can generate ROC curves and compare only variants inside bed files.
GATK also has tools to combine and compare variant and genotype calls (McKenna et al., 2010; DePristo et al., 2011; Van der Auwera et al., 2012).