orig pub
caller class
input type
somatic/denovo
from
validated vs.
cited
used by
compared by
algorithm
features
description
installation
study
source
notes
MuSe
2016
SNV
wes/wgs bam pair
somatic
MD Anderson Cancer Center
mutect, sniper, strelka
0
GDC, SomaticSeq
co-local realignment of paired normal/tumor reads, pre-filter, estimate allele equilibrium frequences and evolutionary disance with F81 Markov substitution model, weighs frequencies against sample-specific error model, requires higher stringency at dbSNP locations
should give competitive performance on impure samples
Markov Substitution model for Evolution (MuSE), which models the evolution of the reference allele to the allelic composition of the tumor and normal tissue at each genomic locus. We further adopt a sample-specific error model to identify cutoffs, reflecting the variation in tumor heterogeneity among samples.
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1029-6
http://bioinformatics.mdanderson.org/main/MuSE
sinvict
2016
SNV
2
free floating tumor dna. no confidence score assigned.
multigems
2016
SNV
freebayes, gatk, samtools, varscan
1
assumes diploid
SomaticSeq
2015
SNV/indel
wg bam pair
somatic
Roche, Bina
mutect/indellocator, varscan2, somaticsniper, jointsnvmix2, vardict
6
meta, decision tree consensus
meta caller, AI consensus, flexible framework? flexible machine learning alg?
Collects up to 72 features per mutation by SAMtools, HaplotypeCaller, and five orthogonal variant callers. The Adaptive Boosting model constructs a decision tree classifier that yields P for each variant.
http://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0758-2
http://bioinform.github.io/somaticseq/
discosnp
2015
SNV/indel
fastq
de novo
Genscale, France
niks, bubbleparse, cortex
20
reference-free de bruijn, k-mer
reference-free
ranks predictions, compute efficient
https://www.ncbi.nlm.nih.gov/pubmed/25404127
http://colibread.inria.fr/software/discosnp/
2kplus2
2015
SNV/SV
cortex
2
de bruijn reference free
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4341063/
https://github.com/danmaclean/2kplus2
ExScalibur
2015
SNV
1
plural, but not meta
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135800
multiSNV
2015
SNV
bams
somatic
Cambridge, Tavare
SomaticSniper, MuTect, UnifiedGenotyper and Platypus
7
probabilistic, multiple samples from same patient
might not work well with just 1 tumor sample
https://bitbucket.org/joseph07/multisnv/wiki/Home
rarevator
2015
SNV/indel
gatk ug input
somatic
mutect, varscan2
0
Fisher exact test on conserved loci from hg19
not very impressive validation; only mentions how many new variants were called by rarevator, not how many were missed
http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1481-9
https://sourceforge.net/projects/rarevator/
SNV-PPILP
2015
SNV
gatk ug vcf
3
perfect phylogeny/integer linear programming
https://www.cs.helsinki.fi/en/gsa/snv-ppilp/
Platypus
2014
SNV/SV?/indel
bams wgs or targeted
somatic
U Oxford
gatk ug/hc, samtools
147
bcbio, bioconda
Bro5
haplotype-based
no dependencies, fast
see somatypus;
http://www.well.ox.ac.uk/platypus-doc
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4753679/
https://github.com/andyrimmer/Platypus
baysic
2014
SNV
vcf pair
20
meta, unsupervised bayesian consensus
meta, unsupervised ranking
uses a Bayesian statistical method based on latent class analysis to combine variant sets produced by different bioinformatic packages (e.g., GATK, FreeBayes, Samtools) into a high-confidence set of genome variants.
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-104
http://genformatic.com/baysic/
HapMuC
2014
SNV/indel
pileups
somatic
VarScan 2, SomaticSniper, Strelka and MuTect
3
bayesian model on haplotype inference
bayes factor ranks predictions
https://www.ncbi.nlm.nih.gov/pubmed/25123903
https://github.com/usuyama/hapmuc
SNPest
2014
SNV/indel
pileups
U Copenhagen
GeMS, freebayes, GATK HC, samtools
2
reference-free probablistic model, generative probabilistic graphical model
confidence score ranks predictions, does not model aneuploidy
http://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-7-698
https://github.com/slindgreen/SNPest
VariantMaster
2014
SNV/indel
bam,vcf,tped,tfam
denovo/somatic
14
reference-free probiblistic model, inference through inheritance
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3912425/
https://sourceforge.net/projects/variantmaster/
Mutect (1 & 2)
2013
SNV
wg or exome bams
population
Broad, Getz
somatic sniper, jointSNVmix, strelka
702
GDC, SomaticSeq, bcbio, rave
Den9, Wash7, Bcb8, Barc2, Van6, Gor4, Swi9
bayesian with variable allele fraction, filter variants appearing in normal pool unless they are known variants. No normal/tumor paired from same patient, no joint calling. No confidence score.
sensitive for low allelic frequency
Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters
https://www.ncbi.nlm.nih.gov/pubmed/23396013
https://github.com/broadinstitute/mutect
EBCall
2013
SNV/indel
exome
population
Vanderbilt, Zhao
varscan 2, somatic sniper
60
Den9, Van6
heuristic with beta-binomial error model from pooled normal bams, simply subtract germ line variants for somatic
doesn't output vcfs, sensitive for low allelic frequency
empirically estimating the distribution of sequencing errors by using a set of non-paired normal samples. Using this approach, we can directly evaluate the discrepancy between the observed allele frequencies and the expected scope of sequencing errors
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3627598/
https://github.com/friend1ws/EBCall
Shearwater
2013
SNV
targeted seq
population
U Cambridge/Welcome Trust
caveman, mutect, deepsnv
11
bayesian beta-binomial
multiple samples
beta-binomial model for variant calling with multiple samples
http://bioinformatics.oxfordjournals.org/content/early/2014/01/31/bioinformatics.btt750.full
https://bioconductor.org/packages/release/bioc/html/deepSNV.html
Shimmer
2013
SNV
wg bams
somatic
NHGRI, Larsen
varscan 2, somatic sniper, deepsnv, jointSNVmix 2
18
Den9, Wash7
Fisher's exact test with multiple testing correction
Supplementary files not online? Validation compares to competitive callers, uses synthetic and real data.
employs a statistical model quite similar to that of Varscan 2, but in addition to this it performs a correction for multiple testing
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3673219/
https://github.com/nhansen/Shimmer
bubbleparse
2013
SNV/SV
cortex
cortex, samtools
13
de bruijn
reference-free
https://www.ncbi.nlm.nih.gov/pubmed/23536903/
Cake
2013
SNV
wgs bams
somatic
Welcome Trust, Adams
bambino, caveman, mpileup, varscan 2
15
meta - merge, consensus, filter
ensemble framework
integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants, Bambino, CaVEMan, SAMtools mpileup, and VarScan 2 with extra filtering
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3740632/
http://cakesomatic.sourceforge.net/
Denovogear
2013
SNV/indel
wg bam trios
denovo
WashU St Louis, Conrad
gatk, polymutt, samtools
49
biocondor
bayesian beta-binomial
joint statistical analysis over multiple samples
model consists of individual genotype likelihoods, transmission probabilities, and priors on the probability of observing a polymorphism or a de novo mutation at any given site in the genome
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4003501/
https://github.com/denovogear/denovogear
qSNP
2013
SNV
wg or exome bams
somatic
U Queensland
GATK, strelka
15
heuristic; minimum of 3 reads, compare to in house database of variants,
fast, easy to run on a cluster
Classification into germline and somatic calls follows a number of simple rules that were designed to accommodate for the expected low mutant allele ratio in low purity tumors
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3826759/
https://sourceforge.net/p/adamajava/wiki/qSNP%201.0/
rvd
2013
SNV
bams
6
beta binomial
matlab
https://bmcresnotes.biomedcentral.com/articles/10.1186/1756-0500-6-206
http://dna-discovery.stanford.edu/software/rvd/
Seurat
2013
SNV/indel/LOH/SV
wg bams paired
somatic
Translational Genomics Research Institute
varscan 2, strelka, somatic sniper
19
Den9, Wash7
bayesian binomial model
Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations
based on a Bayesian algorithm and calculates the joint posterior probability that a variant exists in the tumor sample and not in the normal sample. The resulting VCF file contains both SNVs and indels.
validated against synthetic data
http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-302
https://sites.google.com/site/seuratsomatic/
SNPTools
2013
SNV
41
haplotype imputation, effective base depth, binomial mixture modeling
includes genotype liklihood estimation
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/
https://sourceforge.net/projects/snptools/
vcmm
2013
SNV/indel/SV
pileups
single
RIKEN Japan
gatk, samtools
22
multinomial bayesian from paper in notes & strand bias filter
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3703611/
http://www.mybiosoftware.com/vcmm-variant-caller-with-multinomial-probabilistic-model.html
vip
2013
SNV
pooled population
dna sudoku, overlap log
5
overlapping pools
A complete data analysis framework for overlapping pool designs, with novelties in all three major steps: variant pool and variant locus identification, variant allele frequency estimation and variant sample decoding. VIP is very flexible and can be combined with any pool design approaches and sequence mapping/alignment tools.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530907/
Virmid
2013
SNV/purity
exome bams paired
somatic
UCSD, Bafna
jointSNVmix 2, strelka, varscan 2,
18
Den9
Estimate purity, bayesian inference with estimated joint genotype probability matrix as the prior distribution
estimates purity
estimate α, the level of impurity, i.e. the admixture of stromal cells in the cancer sample. A maximum likelihood estimation method is used. Next, the most probable genotype is estimated in the somatic variant caller step, using a Bayesian algorithm.
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-8-r90
https://sourceforge.net/p/virmid/wiki/Home/
VarScan2
2012
SNV/CNV/indel
mpileups exome
somatic
WashU St Louis, Wilson
somatic sniper
957
GDC, SomaticSeq, bcbio, rave, bioconda
Den9, Wash7, Bcb8, Aus4, Van6, Gor4, Swi9
fisher's exact test, CBS alg for cnv, filters snps by heuristic criteria
fast? according to multigems benchmark not that fast
heuristic pairwise comparisons of base calls and normalized sequence depths at each position. Variants are classified into germline, somatic, LOH and unknown
https://www.ncbi.nlm.nih.gov/pubmed/22300766
http://varscan.sourceforge.net/
JointSNVMix
2012
SNV
wg bams
somatic
U British Columbia, Vancouver
compared to identical but non-joint and joint with fisher's exact test
98
SomaticSeq, rave
Aus4, Van6, Swi9
bayesian joint genotype of the samples
ranks the mutations. true joint calling
probabilistic graphical model to analyse sequence data from tumour/normal pairs. allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.
https://www.ncbi.nlm.nih.gov/pubmed/22285562
https://code.google.com/archive/p/joint-snv-mix/
LoFreq
2012
SNV
pileups wg/exome
somatic
Genome Institute of Singapore
snver, breseq, samtools, some custom methods
108
somaticseq, biocondor
poisson binomial with bernoulli trials
uses bonferoni correction, great for deep sequence <0.05 MAF
models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population
http://nar.oxfordjournals.org/content/40/22/11189.long
Strelka
2012
SNV/indel
wg bams
somatic
Illumina
varscan, samtools
239
somaticseq
Den9, Wash7, Barc2, Aus4, Van6, Gor4
bayesian joint probability of normal and somatic, indel realign
works in presence of impurities, joint calling
Bayesian approach wherein the tumor and normal allele frequencies are treated as continuous values. Search for candidate indels, realign, produce somatic variant probabilities. Strelka uses allele frequencies rather than diploid genotypes
must be manually installed to work with bcbio; bcbio is unlikely to support because of the Makefile build; could be difficult installation; could be worth the effort
http://bioinformatics.oxfordjournals.org/content/28/14/1811
https://sites.google.com/site/strelkasomaticvariantcaller/home/
Atlas2
2012
SNV/indel
exome
Baylor, Yu
gatk ug, dindel, samtools mpileup
135
logistic regression model includes reference/variant reads ratio for calling, and variety of features for filtering
part of Genboree. fast. may be a windows app. works for SOLiD, Illumina, and Roche 454
Est. error as 11bp window rolling average. Filter variants on uni-directional reads.
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-8
https://www.hgsc.bcm.edu/software/atlas-2
CoNAn-SNV
2012
SNV
4
CNV-informed SNV. binomial mixture model, one per copy
integrates information about copy number state of different genomic segments into the inference of single nucleotide variants. CoNAn-SNV requires as input a pileup file (either Maq or Samtools format) and model parameters, as well as a file demarcating segmentation boundaries of copy number amplifications
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0041551
http://compbio.bccrc.ca/software/conan-snv/
cortex
2012
SNV/SV
261
reference free de brujin graph
reference-free
https://www.ncbi.nlm.nih.gov/pubmed/22231483
http://cortexassembler.sourceforge.net/index_cortex_var.html
DeepSNV
2012
SNV/indel
targeted sequencing
population
ETH Zurich
varscan 2, crisp, vipr
80
biocondor
Swi9
beta-binomial model, error model from population data
uses population data, fast due to C implementation
Model for error distribution is based on the observation that sequencing artifacts are recurrent on specific loci. In a large cohort this allows to define a background error distribution on each locus, above which true variants can be called.
http://www.nature.com/articles/ncomms1814
https://bioconductor.org/packages/release/bioc/html/deepSNV.html
GeMS
2012
SNV
pileups
single
UCal Riverside
varscan2, snvmix2, freebayes, maq, samtools, gatk, atlas, soapsnp
22
bayesian multinomial, base- and alignment-quality priors, Dixon's Q-test
max of 2 alleles
statistical model accounts for enzymatic substitution sequencing errors, addresses the multiple testing problem
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3338331/
https://github.com/cui-lab/multigems
impute2
2012
SNV
655
biocondor
haplotype imputation
https://www.ncbi.nlm.nih.gov/pubmed/22820512
https://mathgen.stats.ox.ac.uk/impute/impute_v2.html
Somatic_Sniper
2011
SNV
wb/we bams paired
somatic
Wash U St Louis, Ding
snvmix 2
197
GDC, SomaticSeq, rave, bioconda
Den9, Wash7, Aus4, Van6, Gor4, Swi9
basic joint probability bayesian genotyping
pretty popular
is like Mutect based on a Bayesian posterior possibility. Somatic Sniper reports a somatic score (SSC), a Phred-scaled probability between 0 and 255, that the tumor and normal genotypes are different
Requires an old version of samtools
https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btr665
https://github.com/genome/somatic-sniper
Bambino
2011
SNV/indel
pooled wg bams
somatic
NCI L Population Genetics, Buetow
n/a
46
basic caller? some filters
not command line?
Bambino's variant detector and assembly viewer are capable of pooling and analyzing data from multiple BAM files simultaneously.
https://www.ncbi.nlm.nih.gov/pubmed/21278191
https://github.com/NCIP/cgr-bambino
freeBayes
2011
SNV/indel/MNPs
bam
Erik Garrison
n/a
315
bcbio, biocondor
Bcb8, Bro5
haplotype-aware bayesian inference with multiallelic loci and non-uniform copy number across the samples
classic
e generalize the Bayesian statistical method described by Marth et al. [1999] to allow multiallelic loci and non-uniform copy number across the samples under consideration.
https://arxiv.org/abs/1207.3907
https://github.com/ekg/freebayes
MutationSeq
2011
SNV
wg bams paired, rna(?)
somatic
U British Columbia, Vancouver, Shah
samtools, gatk ug
58
classic machine learning for somatic calling
Comparison of four classic machine learning algorithms toward SNV calling
http://bioinformatics.oxfordjournals.org/content/28/2/167.abstract?keytype=ref&ijkey=oj0Wpkhils4hmyC
http://compbio.bccrc.ca/software/mutationseq/
SNVer
2011
SNV
wg/we bams paired
somatic
New Jersey I of T
CRISP, samtools, gatk
125
model minor alleles from pooled cancer/normal samples, using binomial dist
fast, early paired model, reports p-val
statistical tool SNVer for calling SNPs in analysis of pooled or individual NGS data. Different from the previous models employed by CRISP, it analyzes common and rare variants in one integrated model, which considers and models all relevant factors including variant distribution and sequencing errors simultaneously.
https://www.ncbi.nlm.nih.gov/pubmed/21813454/
syzygy
2011
SNV
mpileups
population
broad
451
multinomial bayesian with filters
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378381/
http://software.broadinstitute.org/software/syzygy/home
VipR
2011
SNV/deletions
mpileups
population
Max Plank Institute
crisp, poisson, varscan
35
discriminate artefactual noise from low MAF alleles
pooled
vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution.
http://bioinformatics.oxfordjournals.org/content/27/13/i77.full
https://sourceforge.net/projects/htsvipr/files/vipR/
CRISP
2010
SNV/indel
128
fisher's exact test
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881398/
https://sites.google.com/site/vibansal/software/crisp
indeLocator
indels
used in Denmark9
CGA/Broad
?
somaticseq
n/a