Currently ~48 hours running "training" of JointSNVMix against IS3 (diploid)
Started downloading AML data from SRA yesterday
training somaticseq
emailed dbsnp
MB data from ICGC available with CLL?
COLO829 bams available in the new dataset? includes AML?
tictac repo http://128.231.12.18/deaconjs/tictac
Create GDC & SomaticSeq branches?
parallel somaticseq call on interval 102
. /etc/profile.d/modules.sh; module load python3/3.5.1; pip install --user regex numpy scipy pysam
module load bedtools python3/3.5.1; module load jdk/1.8.0_111; module load R/3.3.0;
/DCEG/Scimentis/tools2/somaticseq/SomaticSeq.Wrapper.sh \--mutect callers/mutect/is4/results/combined.mutect.is4.102.vcf \--varscan-snv callers/varscan2/is4/results/combined.varscan2.is4.102.vcf.snp.vcf \--varscan-indel callers/varscan2/is4/results/combined.varscan2.is4.102.vcf.indel.vcf \--sniper callers/sniper/is4/results/combined.sniper.is4.102.vcf \--muse callers/muse/is4/results/combined.muse.is4.102.vcf \--normal-bam synthetic.challenge.set4.normal.bam \--tumor-bam synthetic.challenge.set4.tumour.bam \--genome-reference human_g1k_v37_decoy.fasta \--cosmic reference/dbs/b37_cosmic_v54_120711.vcf \--dbsnp dbs/dbsnp_132_b37.leftAligned.vcf.gz \--inclusion-region reference/bedfiles/102.bed \--gatk /DCEG/Scimentis/tools2/gatk/GenomeAnalysisTK.jar \--output-dir callers/somaticseq/is4/results/102.vcf \--ada-r-script somaticseq/r_scripts/ada_model_predictor.R \--classifier-snv reference/somaticseq/training/is3/Ensemble.sSNV.tsv.ntChange.Classifier.RData \--pass-threshold 0.5 \--lowqual-threshold 0.1wg call
The sorted.chr files are sorted then filtered all reads except labels 1-22
. /etc/profile.d/modules.sh; module load python3/3.5.1; \pip install --user regex numpy scipy pysam; module load bedtools python3/3.5.1 jdk/1.8.0_111 R/3.3.0; \SomaticSeq.Wrapper.sh \--mutect callers/mutect/is4/final/merged.mutect.is4.sorted.chr.vcf.recode.vcf \--varscan-snv callers/varscan2/is4/final/merged.varscan2.is4.snp.sorted.chr.vcf.recode.vcf \--varscan-indel callers/varscan2/is4/final/merged.varscan2.is4.indel.sorted.chr.vcf.recode.vcf \--sniper callers/sniper/is4/final/merged.sniper.is4.sorted.chr.vcf.recode.vcf \--muse callers/muse/is4/final/merged.muse.is4.sorted.chr.vcf \--normal-bam synthetic.challenge.set4.normal.bam \--tumor-bam synthetic.challenge.set4.tumour.bam \--genome-reference human_g1k_v37_decoy.fasta \--cosmic b37_cosmic_v54_120711.vcf \--dbsnp dbsnp_132_b37.leftAligned.vcf.gz \--gatk GenomeAnalysisTK.jar \--output-dir wg.somaticseq.is4.ignore.vcf \--ada-r-script somaticseq/r_scripts/ada_model_predictor.R \--classifier-snv somaticseq/training/is3/Ensemble.sSNV.tsv.ntChange.Classifier.RData \--pass-threshold 0.5 \--lowqual-threshold 0.1 \--exclusion-region wg.somaticseq.nox.ignore.bed3 May 2017
25 April 2017
18 April 2017
13 April 2017
sh |bam vcf mpil merged |r qw |pipeline------------------------------------------------------------------666 |666 0 0 0 |0 0 |benchmarks/is1/intervals333 |0 0 333 0 |0 0 |benchmarks/is1/mpileups666 |666 0 0 0 |0 0 |benchmarks/is2/intervals333 |0 0 106 0 |106 0 |benchmarks/is2/mpileups666 |666 0 0 0 |0 0 |benchmarks/is4/intervals333 |0 0 333 0 |0 0 |benchmarks/is4/mpileups666 |666 0 0 0 |0 0 |benchmarks/is3/intervals333 |0 0 333 0 |0 0 |benchmarks/is3/mpileups333 |0 426 0 1 |0 0 |callers/muse/is1333 |0 290 0 0 |0 0 |callers/muse/is2333 |0 876 0 1 |0 0 |callers/muse/is3333 |0 864 0 1 |0 0 |callers/muse/is4333 |0 152 0 0 |40 0 |callers/sniper/is1333 |0 59 0 0 |59 0 |callers/sniper/is2333 |0 999 0 1 |0 0 |callers/sniper/is3333 |0 999 0 1 |0 0 |callers/sniper/is4333 |0 999 0 1 |0 0 |callers/mutect/is1333 |0 661 0 0 |5 0 |callers/mutect/is2333 |0 1332 0 1 |0 0 |callers/mutect/is3333 |0 1332 0 1 |0 0 |callers/mutect/is4333 |0 2664 0 1 |0 0 |callers/varscan2/is3333 |0 650 0 0 |0 0 |callers/varscan2/is430 March 2017
cmdstr += "$BINDIR/MuSEv1.0rc_submission_b391201 " cmdstr += "call " cmdstr += "-l %s/variant_calling_bed.%s.bed "%(bedsloc, region+1) cmdstr += "-O $OUTDIR/muse.is3.%s "%(region+1) cmdstr += "-f $REFDIR/human_g1k_v37_decoy.fasta " cmdstr += "$DATADIR/synthetic.challenge.set3.tumor.bam " cmdstr += "$DATADIR/synthetic.challenge.set3.normal.bam " cmdstr += "\n" outlines.append(cmdstr) cmdstr = "" cmdstr += "$BINDIR/MuSEv1.0rc_submission_b391201 " cmdstr += "sump " cmdstr += "-I $OUTDIR/muse.is3.%s.MuSE.txt "%(region+1) cmdstr += "-G " cmdstr += "-O $OUTDIR/muse.is3.%s.vcf "%(region+1) cmdstr += "-D $DBDIR/dbsnp_db/$DBSNP "tpcount, fpcount, subrecs, submasked, trurecs, trumasked:840 4 844 0 861 0sensitivity, specificity, balanced accuracy: 0.975,0.995,0.985tpcount, fpcount, subrecs, submasked, trurecs, trumasked:5615 198 5813 2869 6391 1512sensitivity, specificity, balanced accuracy: 0.878,0.965,0.922last meeting
cmdstr += "$JAVADIR/java -Xmx4g -jar $BINDIR/mutect-1.1.7.jar " cmdstr += "--analysis_type MuTect " cmdstr += "--reference_sequence $REFDIR/human_g1k_v37_decoy.fasta " cmdstr += "--cosmic $DBDIR/cosmic_db/b37_cosmic_v54_120711.vcf " cmdstr += "--dbsnp $DBDIR/dbsnp_db/dbsnp_132_b37.leftAligned.vcf.gz " cmdstr += "--input_file:normal $DATADIR/synthetic.challenge.set3.normal.bam " cmdstr += "--input_file:tumor $DATADIR/synthetic.challenge.set3.tumor.bam " cmdstr += "--out $OUTDIR/mutect.is3.%s.out "%(region+1) cmdstr += "--vcf $OUTDIR/mutect.is3.%s.vcf "%(region+1) cmdstr += "--enable_qscore_output " cmdstr += "--intervals %s/variant_calling_bed.%s.bed "%(bedsloc, region+1)tpcount, fpcount, subrecs, submasked, trurecs, trumasked:841 50 891 0 861 0sensitivity, specificity, balanced accuracy: 0.976,0.943,0.960tpcount, fpcount, subrecs, submasked, trurecs, trumasked:5952 2135 8087 924607 6391 1512sensitivity, specificity, balanced accuracy: 0.931,0.735,0.833 cmdstr += "$BINDIR/somatic-sniper-1.0.4/build/bin/bam-somaticsniper " cmdstr += "-f $REFDIR/human_g1k_v37_decoy.fasta " cmdstr += "$DATADIR/tumor.is3.%s.bam.gz "%(region+1) cmdstr += "$DATADIR/normal.is3.%s.bam.gz "%(region+1) cmdstr += "$OUTDIR/%s.is3.%s.vcf "%(caller, region+1) cmdstr += "-J " cmdstr += "-F vcf " # recommended by authors cmdstr += "-G -L " cmdstr += "-Q 40 -q 1 "tpcount, fpcount, subrecs, submasked, trurecs, trumasked:770 112 882 0 861 0sensitivity, specificity, balanced accuracy: 0.894,0.873,0.883tpcount, fpcount, subrecs, submasked, trurecs, trumasked:5379 4736 10115 2590 6391 1512sensitivity, specificity, balanced accuracy: 0.841,0.531,0.686java -jar $BINDIR/varscan/VarScan.v2.3.6.jar somatic $DATADIR/chr19.normal-tumor.mpileup $OUTDIR/varscan2.chr19.mvf02 --output-vcf --mpileup 1 --min-var-freq 0.02 --strand-filter 1--minimum-coverage 6tpcount, fpcount, subrecs, submasked, trurecs, trumasked:835 75329 76164 0 861 0sensitivity, specificity, balanced accuracy: 0.969,0.010,0.490not complete (pileups running, may have to mpileup regions)
cmd = "bedtools intersect -sorted -g %s -b %s -abam %s > %s\n"%(genomefile, bedname, bamstosplit[bamindex], interval_bamname) cmd = "samtools mpileup -f $REFDIR/human_g1k_v37_decoy.fasta " cmd += "-q 1 -A -B -Q 13 " cmd += "-l %s/variant_calling_bed.%s.bed "%(bedbase, x+1) cmd += "%s/synthetic.challenge.set3.normal.bam "%(bambase) cmd += "%s/synthetic.challenge.set3.tumor.bam "%(bambase) cmd += " > %s/mpileups/results/%s.is3.%s.mpileup "%(benchbase, bamindex, x+1)