Data sets for benchmarking variant callers
svclassify - builds a set of validation calls using a pedigree to enforce mendelian laws
gindel, speedseq, and manta all use some type of mendelian enforcement to select validation variants
Barcelona/ICGC Real CLL/ML tumor/normal (gold set vcfs available, fastq mostly downloaded)
Craig/TGRI COLO829 tumor/normal cell line (all data waiting on agreement)
Horizon/Acrometrix/SeraCare engineered tumor/normal
WashU AML real tumor/normal (all data waiting on agreement)
DREAM synthetic tumor/normal (IS#1-4 fastq, 10xnormal/tumor fastq, IS#1-5 vcfs currently available)
NIST GiaB trios precisionFDA results (bams, consensus vcfs, fastq available)
Sanger Institute CEPH trio (bams, vcfs available)
Illumina Platinum Genome (bams, fastqs, and vcfs available)
2017 pacbio/bionano alternate NA12878 assembly
2016 10XG/bionano/illumina hybrid NA12878 assembly
2015 Manta HCC1954 breast cancer cell line (original pub)
HX1 Chinese genome
2014 CHM1 haploid human genome
"decode" icelandic
WashU AML (see Tumor/Normal Pairs section)
NZYGMN (HX1?)
svclassify
DREAM synthetic mutation calling challenges 3 & 4
2014 TCGA lung carcinoma
2012 ICGC pediatric brain tumor
2011 1kGP
2010 NA12878
2010 Kidd deletions
2010 Sanger CNV breakpoints
2009 LifeTech
2008 NA18507
2007 HuRef (Venter)
2004 DbGV