pub
caller class
input type
somatic/denovo
from
validated vs.
cited
used by
compared by
algorithm
features
description
installation
study
source
SvABA
2017
sv
low memory, fast
http://biorxiv.org/content/early/2017/02/01/105080
valor
2017
sv
long range sequencing e.g. 10X Genomics linked-read sequencing, pooled clone sequencing
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3444-1
novobreak
2017
sv
assembly, md anderson
http://www.nature.com/nmeth/journal/v14/n1/full/nmeth.4084.html
paywall
VarDict
2016
SV/indel/SNV/LOH
targeted bam
somatic
Astra Zeneca
GATK UG/HC, Freebayes, varscan, pindel, scalpel, manta, lumpy
7
SomaticSeq, RAVE, bcbio
Bcb8
consensus on realigned soft-clipped reads used as search query
Efficient with ultra deep seq. calls complex variants (same read, multiple vars). filters PCR artifacts. Estimates SV allele frequency. Large insertions not yet called. inter-chromosomal fusion not yet called. can perform amplicon-aware calling but for single samples only.
Calls SNV, MNV, InDels, complex and structural variants, performs local realignments on the fly. Performance scales linearly to sequencing depth. Performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings. Is able to detect PCR artifacts. Detects differences in somatic and loss of heterozygosity variants between paired samples.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914105/
https://github.com/AstraZeneca-NGS/VarDict
gridss
2016
SV
poster
bcbio
de bruijn
http://smbe-2016.p.asnevents.com.au/days/2016-07-05/abstract/35482
https://github.com/PapenfussLab/gridss/
SV-Bay
2016
SV
paired-end, mate pair
somatic
Curie Institute
gasvpro, breakdancer, lumpy, delly
2
PEM & read coverage with bayesian testing for adjacencey
works with paired end or mate pair; analyzes tumor/normal pair concurrently
combine PEM signatures and DOC lanking each candidate rearrangement. takes into account GC-content and mappability. Bayesian framework based on both PEM and DOC information ; infers 15 different types of structural variant from the detected novel genomic adjacencies
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896370/
https://github.com/InstitutCurie/SV-Bay
COSMOS
2016
SV
somatic
Advanced Industrial Science and Technology (AIST), Tokyo
Breakdancer, GasVPro, Delly, Lumpy, first using synthetic SVs introduced to hg19 with simulated impurity levels, then using hypermutable mouse ESC model with gamma irradiation
1
discordant pair reads, classify, DOC binomial
fast. mouse model for validation plus synthetic
compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner.
http://nar.oxfordjournals.org/content/44/8/e78
http://wall-lab.stanford.edu/projects/cosmos/
svstat
2016
sv
https://scfbm.biomedcentral.com/articles/10.1186/s13029-016-0051-0
svelter
2016
sv
complex rearrangements
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0993-1
skald
2016
sv
devro
2016
sv
population: paired end/depth of coverage
http://biorxiv.org/content/early/2016/12/15/094474
seq2c
2015
n/a
https://github.com/AstraZeneca-NGS/Seq2C
Manta
2015
SV/indel
wg bam trios
germ
Illumina
pindel, delly
1
bcbio, bioconda
breakend graph/assembly
Parallelized for clusters. handles degraded FFPE samples, uses pedigree-consistency and cosmic for validation. way faster than delly
provides scoring models for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs, with additional applications under development for RNA-Seq, de novo variants, and unmatched tumors. less than a tenth of the time that comparable methods require
https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btv710
https://github.com/Illumina/manta
MetaSV
2015
SV
Bina/Roche
pindel, BreakSeq2, LUMPY, BreakDancer, Delly, CNVNator, MindTheGap
15
bioconda, bcbio (http://blog.bina.com/read/metasv-integration-in-bcbio)
merge then assemble: pindel, BreakSeq2, LUMPY, BreakDancer, Delly, CNVNator, MindTheGap
consensus caller
merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes.
https://www.ncbi.nlm.nih.gov/pubmed/25861968
http://bioinform.github.io/metasv/
Wham
2015
SV
denovo
U of Utah
lumpy, delly, softsearch
3
bcbio, biocondor
bcb8
mate-pair & split read mapping, soft-clipping, alternative alignment, consensus sequence based evidence
requires bwa. does association testing
pinpoint SVs in pooled and genotypic data associated with phenotypic variation. uses split-read, mate-pair, and alternative alignments to find the other SV breakpoint. Positions in the pileup where three or more primary reads share the same breakpoint are interrogated as a putative SV. Use SA and XA cigar tags as alternative alignment locations, cluster those. SW align clipped consensus to alternative locations. intra-chromosomal require min 2 reads.
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004572
BreakMer
2015
SV
targeted seq
somatic
Broad, MacConaill
crest, meerkat, breakdancer, pindel
32
soft clip, identify kmers in reads but not in reference
uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome
http://nar.oxfordjournals.org/content/43/3/e19
https://github.com/ccgd-profile/BreaKmer
indelMiner
2015
indel
wg bam pair
simple de novo, somatic
Penn State U
samtools, pindel, prism
8
split-read, paired-end, soft-clipped. align unmapped reads at both ends to look for indels
validatin against synthetic variants introduced to chr22 and the na18507 data set. recommended to align with gatk indelRealigner
uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0483-6
https://github.com/aakrosh/indelMINER
BreakSeek
2015
SV
Chinese Academy of Sciences
pindel, lumpy, crest, soapindel, breakdancer, prism, delly
3
soft-clipping/breakread break points, paired-end span to validate indel, sophisticated probabilistic scoring model
describes parameters for competitors, works reasonable well for all size indels, estimates level of heterozygosity
unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4538813/
https://sourceforge.net/projects/breakseek/
RAPTR-SV
2015
SV
USDA
7
discordant read-pair, split read, soft-clip, filter
sensitivity for tandem duplications
combining their predictions to generate highly confident SV calls, which can be filtered at runtime for improved accuracy.
https://github.com/njdbickhart/RAPTR-SV
speedseq
2015
SV
WashU St Louis Hall
20
meta. freebayes, lumpy, cnvnator, and custom caller svtyper.
assumes diploid. uses pedigree+ to call validation variants. discusses parameters. reports confidence score.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4589466/
https://github.com/hall-lab/speedseq
Lumpy
2014
SV
U Virginia
gasvpro, delly, pindel
104
bcbio, metasv, biocondor
bcb8, bcb3
merge read-pair, split read, read-depth in a breakpoint probability map. classify and cluster
sensitive to low MAF. not great for small dels
LUMPY integrates disparate signals by converting them to a common format in which the two predicted breakpoint intervals in the reference genome are represented as paired probability distributions.
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-6-r84
https://github.com/arq5x/lumpy-sv
Scalpel
2014
indel
exome-capture
denovo
Cold Spring Harbor, Simons Center for Quantitative Biology
gatk hc, SOAPindel
44
somaticseq, bcbio, biocondor
Bcb8
de bruijn graph traversal, local assembly with iterative k-mer k value reassessment to eliminate repeats.
slow, not for wgs. does indel normalization
localized micro-assembly of specific regions of interest with the goal of detecting mutations with high accuracy and increased power. It is based on the de Bruijn graph assembly paradigm and implements an on-the-fly repeat composition analysis coupled with a self-tuning k-mer strategy
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180789/
http://scalpel.sourceforge.net/
gindel
2014
indels > 50bp
U Conneticut
pindel, cleversv
4
SVM on 7 features: discordant pair, split-read, read depth, concordant encompassing pair, single-end-mapped pair, partially mapped reads, fully-mapped spanning reads,
efficient
An approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. It performs well for insertion genotyping on both simulated and real data. GINDEL can not only call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0113324
https://sourceforge.net/projects/gindel/
Gustaf
2014
SV
Freie U, Berlin
delly, pindel
16
biocondor
local alignment between unmapped reads shows breakpoints
small validation set. claims best for small SVs 30-100bp. small validation set. might work with FFPE
based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution.
http://bioinformatics.oxfordjournals.org/content/30/24/3484
http://www.seqan.de/apps/gustaf/
SMuFin
2014
SV/SNV
paired exome fastqs
somatic
Barcelona Supercomputing Center
mutect, breakdancer, pindel, delly, crest
19
directly compares reads "quaternary sequence tree"
exome only, too slow for WG. parallelized
directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution.
http://www.nature.com/nbt/journal/v32/n11/full/nbt.3027.html
Socrates
2014
SV
somatic
The Walter and Eliza Hall Institute of Medical Research
23
re-aligns and clusters soft-clipped reads
needs parameterized. fast. split-read only algorithms are better for short read FFPE data. "On real tumour data without additional information, we find it impractical to run at its most sensitive settings, but it is easily tuned."
uses split reads to find breakpoints. It is optimized to be fast and extremely sensitive.
http://bioinformatics.oxfordjournals.org/content/30/8/1064
https://github.com/PapenfussLab/socrates
Ulysses
2014
SV
mate-pair only
Lab of Computational and Quantitative Biology, Paris
5
assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise.
http://www.lcqb.upmc.fr/ulysses/#citeUlysses
ViVar
2014
SV
Ghent U, Belgium
10
needs a reference set for sequencing error model
facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264741/
Bellerophon
2013
SV(translocs)
Case Western
GASV, Breakdancer, SVDetect, and CREST
9
discordant reads, soft-clipped
specific to interchromosomal translocations. really basic caller
uses discordant read pairs and "soft-clipped" reads to predict the location of the precise breakpoints. for each chimeric breakpoint, attempts to classify it as a participant in an unbalanced translocation, balanced translocation, or interchromosomal insertion.
https://www.ncbi.nlm.nih.gov/pubmed/23734783
http://cbc.case.edu/Bellerophon/
sv-m
2013
sv
https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-132
pesv-fisher
2013
sv
pem & doc
https://www.ncbi.nlm.nih.gov/pubmed/23704902/
iSVP
2013
SV
breakdancer, delly, pindell, haplotypecaller
16
limit each method's calls to their optimal size ranges
meta. deletions only
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029547/
Meerkat
2013
SV
somatic
Harvard, Park
n/a
110
discordant read pair clustering with refinement
may recognize a relatively narrow scope of unique variants from specific complex rearrangement events like dna repair pathways. identified hundreds of new complex variants in na12878 and na18507. probably a good addition to any meta caller. d
considers local clusters of discordant read pairs to recognize specific complex events. uses split, clipped, and multiple-aligned reads
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3704973/
http://compbio.med.harvard.edu/Meerkat/
soapindel
2013
indels
BGI Shenzhen
dindel, pindel, gatk
57
identifies breakpoints from discordant reads, multi-path de bruijn graph assembly
similar sensitivity and specificity for small indels, higher sensitivity for large indels. might be slow. should call SNPs too. assigns confidence q-scores. weird validation looking at hg19 vs venter genome and chimpanzee vs hg19
assign all unmapped reads with a mapped partner to their expected genomic positions and then perform extensive de novo assembly on the regions with many unmapped reads to resolve homozygous, heterozygous, and complex indels by exhaustive traversal of the de Bruijn graph
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530679/
http://soap.genomics.org.cn/soapindel.html
SoftSearch
2013
SV
Mayo Clinic, Kocher
breakdancer, delly, crest, svseq
24
soft-clipping heuristic, number of soft-clipped reads per position
slow but high TP rate. works at low depth but needs parameters adjusted. Levenstein Distance "confidence scores"
Assuming soft clipping delineates the exact breakpoint position and direction, DRPs overlapping such soft-clipped areas should already contain the information about the type and size of SV, obviating the need for secondary alignments.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0083356
http://bioinformaticstools.mayo.edu/research/softsearch/
TIGRA
2013
breakpoints
population locate common alleles. can be used as de novo?
WashU MD Anderson Cancer Center, Weinstock
spades, velvet, sga, phrap
37
iterative breakpoint collection, de bruijn graph, assembles the breakpoints
uses population data, low FDR
http://genome.cshlp.org/content/early/2013/12/04/gr.162883.113
http://bioinformatics.mdanderson.org/main/TIGRA
Delly
2012
SV/CNVs
EMBL
pindel, breakdancer, gasv, hydra
273
bcbio, metasv, biocondor
bcb8, cbc3
integrates short insert paired-ends and long-range mate-pairs to identify discordant pairs, then uses split-read alignments to identify breakpoints
high sensitivity and specificity, lower sensitivity to small deletions
integrates short insert paired-ends, long-range matepairs and split-read alignments to accurately delineate genomic rearrangements
http://bioinformatics.oxfordjournals.org/content/28/18/i333.abstract
https://github.com/dellytools/delly
cn.MOPS
2012
CNV
population
Johannes Kepler U, Australia
133
biocondor
bcb3
decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions
http://nar.oxfordjournals.org/content/40/9/e69
http://www.bioconductor.org/packages/release/bioc/html/cn.mops.html
cpgbattenberg
2012
608
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3428864/
BreakPointer
2012
SV
Max Plank Institute, Haas
pindel
14
single-end read breakpoint locator
By taking advantage of local non-uniform read distribution and misalignments created by SVs, Breakpointer scans the alignment of single-end reads to identify regions containing potential breakpoints.
http://bioinformatics.oxfordjournals.org/content/28/7/1024
https://github.com/ruping/Breakpointer
CLEVER
2012
SV
Life Sciences Group, Amsterdam
gasv, variationhunter, breakdancer, hydra
56
biocondor
clustering on concordant pairs
better at 20-100bp size range
enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions.
http://bioinformatics.oxfordjournals.org/content/28/22/2875.long
http://www.mybiosoftware.com/clever-2-0rc1-clique-enumerating-variant-finder.html
ForestSV
2012
SV
22
random forests
https://www.ncbi.nlm.nih.gov/pubmed/22751202
http://sebatlab.ucsd.edu/index.php/software-data
GasVPro
2012
SV
Brown U
hydra, breakdancer, CNVer
88
joint P on paired read and read depth
validates vs HuRef, NA18705, and NA12878, has ROC curves. models uncertainty in call/reference overlap for truth calls to be more precise
Combines read depth information along with discordant paired-read mappings into a single probabilistic model two common signals of structural variation.
http://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-3-r22
https://code.google.com/archive/p/gasv/
Prism
2012
SV
U Toronto,
pindel, svseq, splitread, breakdancer, CNVnator, crest
61
uses a split-alignment approach informed by the mapping of paired-end reads, hence enabling breakpoint identification of multiple SV types, including arbitrary-sized inversions, deletions and tandem duplications
http://bioinformatics.oxfordjournals.org/content/28/20/2576
http://compbio.cs.toronto.edu/prism/
SplitRead
2012
SV
exome
de novo
Howard Hughs Medical Institute
82
discordant pairs clustering, map with mrsFast and hamming distance, call anomolous mappings, split unmapped reads, search.
compares read depth between parents and child to identify de novo mutations
searches for clusters of mate pairs where one end maps to the reference genome but the other end does not because it traverses a breakpoint creating a mapping inconsistency with respect to the reference sequence
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269549/
http://splitread.sourceforge.net/
ClipCrop
2011
SV
U Tokyo
breakdancer, cnvnator, pindel
29
not for somatic. doesn't recognize useful mutation types according to socrates authors
A soft-clipped sequence is an unmatched fragment in a partially mapped read
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-S14-S7
https://github.com/shinout/clipcrop
CREST
2011
SV
somatic
St Jude, Zhang
260
single read soft clipping, classification
made for somatic comparisons, lower performance for small deletions
uses the soft-clipping reads to directly map the breakpoints of structural variations
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3527068/
http://www.stjuderesearch.org/site/lab/zhang
GenomeStrip 1 & 2
2011
SV/CNVs
bams
population
Broad, McCarroll
spanner, pindel, breakdancer, pemer, cnvnator
206
discordant read pair clustering, reassemble breakpoint-spanning reads by allele, read-depth for copy number estimate, align unmapped reads to breakpoint database
validated by 1kGP, "most sensitive and accurate". less sensitive to small SVs. 2.0 adds CNV detection
designed to find shared variation using data from multiple individuals. Genome STRiP looks both across and within a set of sequenced genomes to detect variation.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5094049/
http://software.broadinstitute.org/software/genomestrip/
inGap-SV
2011
SV
Chinese Academy of Sci, Zhao
Breakdancer, variationhunter, spanner, PEMer, cortex, pindel
57
paired-end mapping & depth of coverage
good validation against 12878, differentiates homo- and hetero-zygous variants
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3125812/
hydra
2010
SV
sanger split-read & illumina paired-end
U Va
none
195
split-read + paired end
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2860164/
https://code.google.com/archive/p/hydra-sv/
AGE
2010
SV
Yale, Gerstein
65
biocondor
read-depth
improved alignment algorithm, but does not call variants. narrow scope validation, proof of theory. has been modified for metasv inclusion
AGE for Alignment with Gap Excision, finds the optimal solution by simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a ‘large-gap jump’ between the local end alignments to maximize the total alignment score. We also describe extensions allowing the application of AGE to tandem duplications, inversions and complex events involving two large gaps.
http://bioinformatics.oxfordjournals.org/content/27/5/595.abstract
http://sv.gersteinlab.org/age/
slope
2010
SV
WashU St Louis, Pfiefer
pindel, breakdancer
41
basic simulated data
detect sequence breakpoints from only one side of a split read, and therefore does not rely on the insert size for detection.
http://www-genepi.med.utah.edu/suppl/SLOPE/slope_guide.txt
SVDetect
2010
SV
paired-end/mate pairs
Curie Institute
GasV
131
discordant read pairs, adds mate-pairs, sliding window for clustering
anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions– deletions, inversions, duplications and balanced or unbalanced interchromosomal translocations.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2905550/
http://svdetect.sourceforge.net/Site/Manual.html
SVMerge
2010
SV
Sanger Institute
80
meta - BDMax, Pindel, SECluster, RetroSeq, RDXplorer
http://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-12-r128
http://svmerge.sourceforge.net//
Pindel
2009
SV
EMBL, Ning
768
metasv, biocondor
split-read clustering, pattern growth algorithm to search local space for unmapped (split) read
slow, high FP rate
detect breakpoints of large deletions (1bp-10kbp) and medium sized insertions (1-20bp) from paired-end short reads
https://www.ncbi.nlm.nih.gov/pubmed/19561018
http://gmt.genome.wustl.edu/packages/pindel/index.html
BreakDancer
2009
SV
somatic
WashU St Louis, Mardis
MoDIL, VariationHunter
719
metasv
paired-end MAQ calls; classify, cluster, multi-nomial Poisson-based confidence score
confidence scores, use Q>80
predicts large and small 10-100bp indels, inversions and translocations
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3661775/
http://gmt.genome.wustl.edu/packages/breakdancer/
BreakSeq 1 & 2 (2015)
2009
SV
Yale, Gerstein
115
metasv
map reads to known breakpoints from a database
breakseq2 in notes
scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951730/
http://sv.gersteinlab.org/breakseq/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4451611/
PEMer
2009
SV
Yale, Gerstein
PEM
202
split-read clustering, merges clusters