Currently ~48 hours running "training" of JointSNVMix against IS3 (diploid)
Started downloading AML data from SRA yesterday
training somaticseq
emailed dbsnp
MB data from ICGC available with CLL?
COLO829 bams available in the new dataset? includes AML?
tictac repo http://128.231.12.18/deaconjs/tictac
Create GDC & SomaticSeq branches?
parallel somaticseq call on interval 102
. /etc/profile.d/modules.sh; module load python3/3.5.1; pip install --user regex numpy scipy pysam
module load bedtools python3/3.5.1; module load jdk/1.8.0_111; module load R/3.3.0;
/DCEG/Scimentis/tools2/somaticseq/SomaticSeq.Wrapper.sh \
--mutect callers/mutect/is4/results/combined.mutect.is4.102.vcf \
--varscan-snv callers/varscan2/is4/results/combined.varscan2.is4.102.vcf.snp.vcf \
--varscan-indel callers/varscan2/is4/results/combined.varscan2.is4.102.vcf.indel.vcf \
--sniper callers/sniper/is4/results/combined.sniper.is4.102.vcf \
--muse callers/muse/is4/results/combined.muse.is4.102.vcf \
--normal-bam synthetic.challenge.set4.normal.bam \
--tumor-bam synthetic.challenge.set4.tumour.bam \
--genome-reference human_g1k_v37_decoy.fasta \
--cosmic reference/dbs/b37_cosmic_v54_120711.vcf \
--dbsnp dbs/dbsnp_132_b37.leftAligned.vcf.gz \
--inclusion-region reference/bedfiles/102.bed \
--gatk /DCEG/Scimentis/tools2/gatk/GenomeAnalysisTK.jar \
--output-dir callers/somaticseq/is4/results/102.vcf \
--ada-r-script somaticseq/r_scripts/ada_model_predictor.R \
--classifier-snv reference/somaticseq/training/is3/Ensemble.sSNV.tsv.ntChange.Classifier.RData \
--pass-threshold 0.5 \
--lowqual-threshold 0.1
wg call
The sorted.chr files are sorted then filtered all reads except labels 1-22
. /etc/profile.d/modules.sh; module load python3/3.5.1; \
pip install --user regex numpy scipy pysam; module load bedtools python3/3.5.1 jdk/1.8.0_111 R/3.3.0; \
SomaticSeq.Wrapper.sh \
--mutect callers/mutect/is4/final/merged.mutect.is4.sorted.chr.vcf.recode.vcf \
--varscan-snv callers/varscan2/is4/final/merged.varscan2.is4.snp.sorted.chr.vcf.recode.vcf \
--varscan-indel callers/varscan2/is4/final/merged.varscan2.is4.indel.sorted.chr.vcf.recode.vcf \
--sniper callers/sniper/is4/final/merged.sniper.is4.sorted.chr.vcf.recode.vcf \
--muse callers/muse/is4/final/merged.muse.is4.sorted.chr.vcf \
--normal-bam synthetic.challenge.set4.normal.bam \
--tumor-bam synthetic.challenge.set4.tumour.bam \
--genome-reference human_g1k_v37_decoy.fasta \
--cosmic b37_cosmic_v54_120711.vcf \
--dbsnp dbsnp_132_b37.leftAligned.vcf.gz \
--gatk GenomeAnalysisTK.jar \
--output-dir wg.somaticseq.is4.ignore.vcf \
--ada-r-script somaticseq/r_scripts/ada_model_predictor.R \
--classifier-snv somaticseq/training/is3/Ensemble.sSNV.tsv.ntChange.Classifier.RData \
--pass-threshold 0.5 \
--lowqual-threshold 0.1 \
--exclusion-region wg.somaticseq.nox.ignore.bed
3 May 2017
25 April 2017
18 April 2017
13 April 2017
sh |bam vcf mpil merged |r qw |pipeline
------------------------------------------------------------------
666 |666 0 0 0 |0 0 |benchmarks/is1/intervals
333 |0 0 333 0 |0 0 |benchmarks/is1/mpileups
666 |666 0 0 0 |0 0 |benchmarks/is2/intervals
333 |0 0 106 0 |106 0 |benchmarks/is2/mpileups
666 |666 0 0 0 |0 0 |benchmarks/is4/intervals
333 |0 0 333 0 |0 0 |benchmarks/is4/mpileups
666 |666 0 0 0 |0 0 |benchmarks/is3/intervals
333 |0 0 333 0 |0 0 |benchmarks/is3/mpileups
333 |0 426 0 1 |0 0 |callers/muse/is1
333 |0 290 0 0 |0 0 |callers/muse/is2
333 |0 876 0 1 |0 0 |callers/muse/is3
333 |0 864 0 1 |0 0 |callers/muse/is4
333 |0 152 0 0 |40 0 |callers/sniper/is1
333 |0 59 0 0 |59 0 |callers/sniper/is2
333 |0 999 0 1 |0 0 |callers/sniper/is3
333 |0 999 0 1 |0 0 |callers/sniper/is4
333 |0 999 0 1 |0 0 |callers/mutect/is1
333 |0 661 0 0 |5 0 |callers/mutect/is2
333 |0 1332 0 1 |0 0 |callers/mutect/is3
333 |0 1332 0 1 |0 0 |callers/mutect/is4
333 |0 2664 0 1 |0 0 |callers/varscan2/is3
333 |0 650 0 0 |0 0 |callers/varscan2/is4
30 March 2017
cmdstr += "$BINDIR/MuSEv1.0rc_submission_b391201 "
cmdstr += "call "
cmdstr += "-l %s/variant_calling_bed.%s.bed "%(bedsloc, region+1)
cmdstr += "-O $OUTDIR/muse.is3.%s "%(region+1)
cmdstr += "-f $REFDIR/human_g1k_v37_decoy.fasta "
cmdstr += "$DATADIR/synthetic.challenge.set3.tumor.bam "
cmdstr += "$DATADIR/synthetic.challenge.set3.normal.bam "
cmdstr += "\n"
outlines.append(cmdstr)
cmdstr = ""
cmdstr += "$BINDIR/MuSEv1.0rc_submission_b391201 "
cmdstr += "sump "
cmdstr += "-I $OUTDIR/muse.is3.%s.MuSE.txt "%(region+1)
cmdstr += "-G "
cmdstr += "-O $OUTDIR/muse.is3.%s.vcf "%(region+1)
cmdstr += "-D $DBDIR/dbsnp_db/$DBSNP "
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
840 4 844 0 861 0
sensitivity, specificity, balanced accuracy: 0.975,0.995,0.985
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
5615 198 5813 2869 6391 1512
sensitivity, specificity, balanced accuracy: 0.878,0.965,0.922
last meeting
cmdstr += "$JAVADIR/java -Xmx4g -jar $BINDIR/mutect-1.1.7.jar "
cmdstr += "--analysis_type MuTect "
cmdstr += "--reference_sequence $REFDIR/human_g1k_v37_decoy.fasta "
cmdstr += "--cosmic $DBDIR/cosmic_db/b37_cosmic_v54_120711.vcf "
cmdstr += "--dbsnp $DBDIR/dbsnp_db/dbsnp_132_b37.leftAligned.vcf.gz "
cmdstr += "--input_file:normal $DATADIR/synthetic.challenge.set3.normal.bam "
cmdstr += "--input_file:tumor $DATADIR/synthetic.challenge.set3.tumor.bam "
cmdstr += "--out $OUTDIR/mutect.is3.%s.out "%(region+1)
cmdstr += "--vcf $OUTDIR/mutect.is3.%s.vcf "%(region+1)
cmdstr += "--enable_qscore_output "
cmdstr += "--intervals %s/variant_calling_bed.%s.bed "%(bedsloc, region+1)
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
841 50 891 0 861 0
sensitivity, specificity, balanced accuracy: 0.976,0.943,0.960
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
5952 2135 8087 924607 6391 1512
sensitivity, specificity, balanced accuracy: 0.931,0.735,0.833
cmdstr += "$BINDIR/somatic-sniper-1.0.4/build/bin/bam-somaticsniper "
cmdstr += "-f $REFDIR/human_g1k_v37_decoy.fasta "
cmdstr += "$DATADIR/tumor.is3.%s.bam.gz "%(region+1)
cmdstr += "$DATADIR/normal.is3.%s.bam.gz "%(region+1)
cmdstr += "$OUTDIR/%s.is3.%s.vcf "%(caller, region+1)
cmdstr += "-J "
cmdstr += "-F vcf "
# recommended by authors
cmdstr += "-G -L "
cmdstr += "-Q 40 -q 1 "
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
770 112 882 0 861 0
sensitivity, specificity, balanced accuracy: 0.894,0.873,0.883
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
5379 4736 10115 2590 6391 1512
sensitivity, specificity, balanced accuracy: 0.841,0.531,0.686
java -jar $BINDIR/varscan/VarScan.v2.3.6.jar
somatic
$DATADIR/chr19.normal-tumor.mpileup
$OUTDIR/varscan2.chr19.mvf02
--output-vcf
--mpileup 1
--min-var-freq 0.02
--strand-filter 1
--minimum-coverage 6
tpcount, fpcount, subrecs, submasked, trurecs, trumasked:
835 75329 76164 0 861 0
sensitivity, specificity, balanced accuracy: 0.969,0.010,0.490
not complete (pileups running, may have to mpileup regions)
cmd = "bedtools intersect -sorted -g %s -b %s -abam %s > %s\n"%(genomefile, bedname, bamstosplit[bamindex], interval_bamname)
cmd = "samtools mpileup -f $REFDIR/human_g1k_v37_decoy.fasta "
cmd += "-q 1 -A -B -Q 13 "
cmd += "-l %s/variant_calling_bed.%s.bed "%(bedbase, x+1)
cmd += "%s/synthetic.challenge.set3.normal.bam "%(bambase)
cmd += "%s/synthetic.challenge.set3.tumor.bam "%(bambase)
cmd += " > %s/mpileups/results/%s.is3.%s.mpileup "%(benchbase, bamindex, x+1)