GDC parameters
Mutect2 - GATK nightly-2016-02-25-gf39d340
The MuTect2 pipeline employs a "Panel of Normals" to identify additional germline mutations. This panel is generated using genomes from TCGA blood normal samples from thousands of individuals that were curated and confidently assessed to be cancer-free.
java -jar GenomeAnalysisTK.jar \ -T MuTect2 \ -R <reference> \ -L <region> \ -I:tumor <tumor.bam> \ -I:normal <normal.bam> \ --normal_panel <pon.vcf> \ --cosmic <cosmic.vcff> \ --dbsnp <dbsnp.vcf> \ --contamination_fraction_to_filter 0.02 \ -o <mutect_variants.vcf> \ --output_mode EMIT_VARIANTS_ONLY \ --disable_auto_index_creation_and_locking_when_reading_rods
Muse - MuSEv1.0rc_submission_c039ffa
MuSE call \ -f <reference> \ -r <region> \ <tumor.bam> \ <normal.bam> \ -O <intermediate_muse_call.txt>
MuSE sump \ -I <intermediate_muse_call.txt> \ -E \ -D <dbsnp_known_snp_sites.vcf> \ -O <muse_variants.vcf>
Sniper v1.0.5.0
bam-somaticsniper \ -q 0 \ -Q 15 \ -s 0.01 \ -T 0.85 \ -N 2 \ -r 0.001 \ -n NORMAL \ -t TUMOR \ -F vcf \ -f ref.fa \ <tumor.bam> \ <normal.bam> \ <somaticsniper_variants.vcf>
Varscan2
Mpileup; Samtools 1.1
samtools mpileup \ -f <reference> \ -q 1 \ -B \ <normal.bam> \ <tumor.bam> > <intermediate_mpileup.pileup>
Varscan Somatic; Varscan.v2.3.9
java -jar VarScan.jar somatic \ <intermediate_mpileup.pileup> \ <output_path> \ --mpileup 1 \ --min-coverage 8 \ --min-coverage-normal 8 \ --min-coverage-tumor 6 \ --min-var-freq 0.10 \ --min-freq-for-hom 0.75 \ --normal-purity 1.0 \ --tumor-purity 1.00 \ --p-value 0.99 \ --somatic-p-value 0.05 \ --strand-filter 0 \ --output-vcf
Varscan ProcessSomatic; Varscan.v2.3.9
java -jar VarScan.jar processSomatic \ <intermediate_varscan_somatic.vcf> \ --min-tumor-freq 0.10 \ --max-normal-freq 0.05 \ --p-value 0.07
We use SAMtools [30] and GATK HaplotypeCaller on the tumor and normal BAM files to obtain a number of independent sequencing features that have predictive values for their somatic mutation statuses, e.g., mapping quality, base call quality, strand bias, depth of coverage, tail distance bias, etc. Some caller features, e.g., somatic mutation scores based on its distinct statistics, are also included. For the DREAM Challenge and real data, we also consider whether the site is in dbSNP. Two of the most important features in the adaptively boosted classifiers include the root-mean-square mapping quality score and the number of read mismatches compared to the reference.
For the results described in this study, we have used P≥0.7 as the cut-off for our SomaticSeq results, i.e., a candidate site of P≥0.7 is considered a PASS call, whereas a candidate site of P<0.7 is considered LowQual.
Since eight of the top 18 features related directly to sequencing depth, it is important for the trained model to have a comparable sequencing depth as the target set. Thus, it would not be appropriate to use a 30 × whole-genome sequence trained model to predict somatic mutations in a 500 × targeted sequencing
From http://bioinform.github.io/somaticseq/data.html
Where ‘here’ is a dead link and ‘this’ refers to the dead link here: https://drive.google.com/drive/folders/0B9pfRlnkG-Z7STNNczk4ak5xSmM
1) Train on Stage 2, test on straight Stage 3
mixed Stage 2 tumor/normal data at 70:30 ratio for training, test data was Stage 3
results were averaged over ten cross-validation results (the training set consists of half of the entire data set, randomly chosen). We performed twofold cross-validation ten times
2) Trained on Stage 2, testing on variants of Stage 3 data
3) In Silico Titration
4) SomaticSpike
5) COLO-829, CLL1 trained on Stage 3 data
Mutect) dbSNP v.138, COSMIC v.69, Panel Of Normal based on Phase 1 of the 1kGP as resource files for the real sequencing data. Did not supply COSMIC for DREAM Challenge, because synthetic mutations were randomly chosen and not enriched in COSMIC sites. In our in silico titration and SomaticSpike experiments, none of these databases was used.
SomaticSniper) mapping quality cut-off 25, base quality cut-off 15, prior somatic mutation probability 10 −4
VarScan2) mapping quality cut-off 25, base quality cut-off of 20.
JointSNVMix2) convergence threshold of 0.01 in training, somatic probability ≥0.95
VarDict) relaxed the variant depth filter from 4 to 2, and the FET p-value cut-off from 0.05 to 0.15. allowed each call to fail for up to two out of 20 VarDict filters.