Variants in the truth set are labeled PASS in the FILTER column.
Note that the PASS label is different from the AllPASS label. AllPASS label denotes variants where SomaticSeq has classified it as PASS in each of the 63 tumor-normal BAM files. Variants inside the three arm loss regions may be labeled AllPASS but not PASS, hence not considered a part of the somatic mutation truth set.
All the somatic mutation call sets, SomaticSeq outputs, and SomaticSeq classifiers used to build the "truth set" are released on the NCBI ftp site.
This step was done locally at Roche Sequencing, but the results were uploaded to CGC.
12 sSNV SomaticSeq classifiers and 12 sINDEL SomaticSeq classifiers were created, i.e., for every combination of IL/NS/FD/NV and BWA/Bowtie2/NovoAlign.
For each classifier, bamsurgeon workflow was used to create ~100K synthetic SNVs and ~20K synthetic INDELs in replicated normals as those sites. As an example, the classifiers for Fudan data with BWA was created as follows:
Synthetic mutations were spiked into bwa.FD_N_2, and then SomaticSeq tumor-normal analysis (6 somatic mutation callers in MuTect2, SomaticSniper, VarDict, MuSE, Strelka, and TNscope) were run with bwa.FD_N_1 as the matched normal.
Synthetic mutations were spiked into bwa.FD_N_3, and then SomaticSeq tumor-normal analysis were run with bwa.FD_N_2 as the matched normal.
The two analyses above were combined to create SomaticSeq classifier (i.e., concatenating the Ensemble.sSNV.tsv files from the two analyses). The two classifiers trained from this data (one for sSNV and one for sINDEL) set is used to score/classify all three real BWA tumor-normal sample sets from Fudan.
The detailed documentation of all those runs are described in detail in the page: BAM Simulation pipeline in SomaticSeq, which not only describes our commands, but has link to our archived run scripts.
Files related to this step uploaded to CGC are in FDA_SEQC2_WG#1 projects with tags of "bamsurgeon."
Examples
bwa.tumorDesignate_FD_N_2.bam is the semi-synthetic tumor BAM file by spiking synthetic mutations into WGS_FD_N_2.bwa.dedup.bam.
bwa.tumorDesignate_FD_N_2.synthetic_snvs.vcf is the ground truth sSNV for the BAM file described above.
bwa.tumorDesignate_FD_N_2.synthetic_indels.leftAlign.vcf is the ground truth sINDEL for the BAM file described above.
bwa.FD.twoSets.MSDUKT.sSNV.tsv.ntChange.Classifier.RData is the sSNV SomaticSeq classifier.
bwa.FD.twoSets.MDKT.sINDEL.tsv.ntChange.Classifier.RData is the sINDEL SomaticSeq classifier.
Example CGC Workflow to run MuTect2, SomaticSniper, VarDict, MuSE, and Strelka on a pair of tumor-normal data set:
Example of CGC Workflow to run TNscope 201711.02:
The details of somatic mutation workflows on CGC, and the location of the result output files are described in this page: Somatic Mutation Results on Seven Bridges Genomics.
Descriptions of each somatic mutatino caller on CGC workflow is described in this page: Somatic Mutation Tools on CGC.
This page showed "historical" progress made to successfully test each piece of tools/apps/workflows on CGC: Building workflow on Seven Bridges Genomics.
This step was also done locally at Roche Sequencing. However, the results as well as working examples are on the CGC.
For each of the 54 data sets that came from a sequencing center with replicates, i.e., IL/NS/FD/NV with BWA/Bowtie/NovoAlign, the corresponding SomaticSeq classifier was used to score each variant.
An example on CGC: https://cgc.sbgenomics.com/u/xiaowen/fda-seqc2-wg-1/tasks/7137edf0-c49c-4339-b948-d5c93e9b7b33
For the 9 data sets that came from sequencing centers without replicates, i.e., LL/EA/NC, SomaticSeq defaulted to majority-vote consensus vote to classify each variant.
An example on CGC: https://cgc.sbgenomics.com/u/xiaowen/fda-seqc2-wg-1/tasks/bcf15ef7-8214-4bf7-83c9-d41763b2c49a
Documentations and files for archival purposes:
The commands/scripts to use SomaticSeq to combine the results of 6 callers for each of the 63 data sets: here.
The commands/scripts to invoke SomaticSeq classification as well as SEQC2 reformatting (described in step 4): here.
SomaticSeq's output VCF files are zipped and uploaded to CGC (63 SNV VCF files and 63 INDEL VCF files): SomaticSeq.v2.7.2_6Tools.zip
A script was created to move all the things in INFO to sample columns, in order to preserve such information after combining the VCF files using GATK3 CombineVariants. The script is in the seqc2 branch of SomaticSeq.
Command used for VCF files that were classified by a SomaticSeq classifier:
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/reformat_VCF2SEQC2.py \
-infile /mnt/$ABSOLUTE/PATH/WGS.bwa.dedup-FD_T_1_vs_FD_N_1-SomaticSeq.sSNV.vcf \
-outfile /mnt/$ABSOLUTE/PATH/reFormat.sSNV.predicted.by.bwa.IL1N_vs_IL2N.twoWay.vcf \
-callers MSDUKT \
-tumor FD_T_1.bwa \
-trained
Command used for VCF files that were classified by majority-vote consensus:
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/reformat_VCF2SEQC2.py \
-infile /mnt/$ABSOLUTE/PATH/WGS.bwa.dedup-LL_T_1_vs_LL_N_1-MSDUKT.Consensus.snv.vcf \
-outfile /mnt/$ABSOLUTE/PATH/reFormat.bwa.dedup-LL_T_1_vs_LL_N_1-MSDUKT.Consensus.snv.vcf \
-callers MSDUKT \
-tumor LL_T_1.bwa
-caller MSDUKT
In the original VCF file, the INFO field contains a string such as MSDUKT=1,0,1,1,0,1 (M=MuTect, S=SomaticSniper, D=VarDict, U=MuSE, K=Strelka, T=TNscope) that denotes the callers that called the variant a somatic mutation. This field is moved to the sample field, so that when combined, this information is preserved for each data set. For INDEL, this string is MDKT.
-tumor FD_T_1.bwa
To name sample name FD_T_1.bwa, so sample names of the same sequencing replicates do not clash when combining variants.
63 SNV VCF files and 63 INDEL VCF files were zipped and uploaded to CGC: SomaticSeq.Reformatted.v2.7.2.zip
The commands/scripts to invoke SomaticSeq classification (step 3) as well as SEQC2 reformatting: here.
SNV and INDEL are done separately.
Reformatted VCF files were created for this purpose.
These are the two commands, one for SNV and one for INDEL.
java -jar GenomeAnalysisTK.jar -T CombineVariants -R GRCh38.d1.vd1.fa --setKey null --genotypemergeoption UNSORTED -V reFormat.WGS.bowtie.dedup-EA_T_1_vs_EA_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.bwa.dedup-EA_T_1_vs_EA_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.novo.dedup-EA_T_1_vs_EA_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.WGS.bowtie.dedup-LL_T_1_vs_LL_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.bwa.dedup-LL_T_1_vs_LL_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.novo.dedup-LL_T_1_vs_LL_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.bowtie.dedup-NC_T_1_vs_NC_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.bwa.dedup-NC_T_1_vs_NC_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.WGS.novo.dedup-NC_T_1_vs_NC_N_1-Consensus.v2.7.2.sSNV.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -V reFormat.sSNV.predicted.by.v2.7.2.vcf -o sSNV.v2.7.2.combined.beta.4.vcf
Files
PASS means a SomaticSeq ≥ 0.7 (for 54/63 data sets where SomaticSeq classifiers were built) or Caller Consensus of ≥ 50% of callers (for 9/63 data sets)
Reduce number of total variant positions by keeping variant calls that have been deemed PASS in at least one sample set, e.g., tossing out variants who only made it this far due to REJECT or germline output by one caller in one sample. In other words, every variant call in the super set has at least some level of evidence support.
This step reduced the number of total somatic SNV/INDEL candidates from and 869,777 down to 204,094 and 23,340.
Initial tiers are assigned as described here.
Command:
# SNV
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/highConfidenceBuilder.py \
-infile /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.combined.draft.beta.3.vcf \
-outfile /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.vcf \
-ncallers 3 -all
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.vcf
# INDEL
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/highConfidenceBuilder.py \
-infile /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.combined.draft.beta.3.vcf \
-outfile /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.vcf \
-ncallers 2 -all
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.vcf
-ncallers argument
There are 6 somatic SNV callers, so 3 constitutes "majority-vote consensus."
There are 4 somatic INDEL callers, so 2 or more constitutes "majority-vote consensus."
Files
# SNV
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/SSeq_vcf2tsv_multiPairBam.py \
-myvcf sSNV.MSDUKT.dedup_all_draft.beta.3.3.vcf.gz \
-nprefix IL_N_1.bwa IL_N_1.bowtie IL_N_1.novo IL_N_2.bwa IL_N_2.bowtie IL_N_2.novo IL_N_3.bwa IL_N_3.bowtie IL_N_3.novo NV_N_1.bwa NV_N_1.bowtie NV_N_1.novo NV_N_2.bwa NV_N_2.bowtie NV_N_2.novo NV_N_3.bwa NV_N_3.bowtie NV_N_3.novo FD_N_1.bwa FD_N_1.bowtie FD_N_1.novo FD_N_2.bwa FD_N_2.bowtie FD_N_2.novo FD_N_3.bwa FD_N_3.bowtie FD_N_3.novo NS_N_1.bwa NS_N_1.bowtie NS_N_1.novo NS_N_2.bwa NS_N_2.bowtie NS_N_2.novo NS_N_3.bwa NS_N_3.bowtie NS_N_3.novo NS_N_4.bwa NS_N_4.bowtie NS_N_4.novo NS_N_5.bwa NS_N_5.bowtie NS_N_5.novo NS_N_6.bwa NS_N_6.bowtie NS_N_6.novo NS_N_7.bwa NS_N_7.bowtie NS_N_7.novo NS_N_8.bwa NS_N_8.bowtie NS_N_8.novo NS_N_9.bwa NS_N_9.bowtie NS_N_9.novo EA_N_1.bwa EA_N_1.bowtie EA_N_1.novo NC_N_1.bwa NC_N_1.bowtie NC_N_1.novo LL_N_1.bwa LL_N_1.bowtie LL_N_1.novo \
-tprefix IL_T_1.bwa IL_T_1.bowtie IL_T_1.novo IL_T_2.bwa IL_T_2.bowtie IL_T_2.novo IL_T_3.bwa IL_T_3.bowtie IL_T_3.novo NV_T_1.bwa NV_T_1.bowtie NV_T_1.novo NV_T_2.bwa NV_T_2.bowtie NV_T_2.novo NV_T_3.bwa NV_T_3.bowtie NV_T_3.novo FD_T_1.bwa FD_T_1.bowtie FD_T_1.novo FD_T_2.bwa FD_T_2.bowtie FD_T_2.novo FD_T_3.bwa FD_T_3.bowtie FD_T_3.novo NS_T_1.bwa NS_T_1.bowtie NS_T_1.novo NS_T_2.bwa NS_T_2.bowtie NS_T_2.novo NS_T_3.bwa NS_T_3.bowtie NS_T_3.novo NS_T_4.bwa NS_T_4.bowtie NS_T_4.novo NS_T_5.bwa NS_T_5.bowtie NS_T_5.novo NS_T_6.bwa NS_T_6.bowtie NS_T_6.novo NS_T_7.bwa NS_T_7.bowtie NS_T_7.novo NS_T_8.bwa NS_T_8.bowtie NS_T_8.novo NS_T_9.bwa NS_T_9.bowtie NS_T_9.novo EA_T_1.bwa EA_T_1.bowtie EA_T_1.novo NC_T_1.bwa NC_T_1.bowtie NC_T_1.novo LL_T_1.bwa LL_T_1.bowtie LL_T_1.novo \
-nbam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.novo.dedup.bam \
-tbam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.novo.dedup.bam \
-ref /mnt/$ABSOLUTE/PATH/GRCh38.d1.vd1.fa \
-dbsnp /mnt/$ABSOLUTE/PATH/dbsnp_146.hg38.vcf.gz \
-cosmic /mnt/$ABSOLUTE/PATH/COSMICv83.All.noSNP.vcf \
-inclusion _T_ \
-callers MSDUKT \
-dedup \
-outfile /mnt/$ABSOLUTE/PATH/63BAMs.sSNV.MSDUKT.dedup_all_draft.beta.3.tsv
# INDEL
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/SSeq_vcf2tsv_multiPairBam.py \
-myvcf sINDEL.MDKT.dedup_all_draft.beta.3.3.vcf.gz \
-nprefix IL_N_1.bwa IL_N_1.bowtie IL_N_1.novo IL_N_2.bwa IL_N_2.bowtie IL_N_2.novo IL_N_3.bwa IL_N_3.bowtie IL_N_3.novo NV_N_1.bwa NV_N_1.bowtie NV_N_1.novo NV_N_2.bwa NV_N_2.bowtie NV_N_2.novo NV_N_3.bwa NV_N_3.bowtie NV_N_3.novo FD_N_1.bwa FD_N_1.bowtie FD_N_1.novo FD_N_2.bwa FD_N_2.bowtie FD_N_2.novo FD_N_3.bwa FD_N_3.bowtie FD_N_3.novo NS_N_1.bwa NS_N_1.bowtie NS_N_1.novo NS_N_2.bwa NS_N_2.bowtie NS_N_2.novo NS_N_3.bwa NS_N_3.bowtie NS_N_3.novo NS_N_4.bwa NS_N_4.bowtie NS_N_4.novo NS_N_5.bwa NS_N_5.bowtie NS_N_5.novo NS_N_6.bwa NS_N_6.bowtie NS_N_6.novo NS_N_7.bwa NS_N_7.bowtie NS_N_7.novo NS_N_8.bwa NS_N_8.bowtie NS_N_8.novo NS_N_9.bwa NS_N_9.bowtie NS_N_9.novo EA_N_1.bwa EA_N_1.bowtie EA_N_1.novo NC_N_1.bwa NC_N_1.bowtie NC_N_1.novo LL_N_1.bwa LL_N_1.bowtie LL_N_1.novo \
-tprefix IL_T_1.bwa IL_T_1.bowtie IL_T_1.novo IL_T_2.bwa IL_T_2.bowtie IL_T_2.novo IL_T_3.bwa IL_T_3.bowtie IL_T_3.novo NV_T_1.bwa NV_T_1.bowtie NV_T_1.novo NV_T_2.bwa NV_T_2.bowtie NV_T_2.novo NV_T_3.bwa NV_T_3.bowtie NV_T_3.novo FD_T_1.bwa FD_T_1.bowtie FD_T_1.novo FD_T_2.bwa FD_T_2.bowtie FD_T_2.novo FD_T_3.bwa FD_T_3.bowtie FD_T_3.novo NS_T_1.bwa NS_T_1.bowtie NS_T_1.novo NS_T_2.bwa NS_T_2.bowtie NS_T_2.novo NS_T_3.bwa NS_T_3.bowtie NS_T_3.novo NS_T_4.bwa NS_T_4.bowtie NS_T_4.novo NS_T_5.bwa NS_T_5.bowtie NS_T_5.novo NS_T_6.bwa NS_T_6.bowtie NS_T_6.novo NS_T_7.bwa NS_T_7.bowtie NS_T_7.novo NS_T_8.bwa NS_T_8.bowtie NS_T_8.novo NS_T_9.bwa NS_T_9.bowtie NS_T_9.novo EA_T_1.bwa EA_T_1.bowtie EA_T_1.novo NC_T_1.bwa NC_T_1.bowtie NC_T_1.novo LL_T_1.bwa LL_T_1.bowtie LL_T_1.novo \
-nbam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_4.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_5.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_6.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_7.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_8.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_N_9.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_N_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_N_1.novo.dedup.bam \
-tbam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_IL_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NV_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_FD_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_2.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_3.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_4.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_5.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_6.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_7.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_8.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NS_T_9.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_EA_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_NC_T_1.novo.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.bwa.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.bowtie.dedup.bam /mnt/$ABSOLUTE/PATH/WGS_LL_T_1.novo.dedup.bam \
-ref /mnt/$ABSOLUTE/PATH/GRCh38.d1.vd1.fa \
-dbsnp /mnt/$ABSOLUTE/PATH/dbsnp_146.hg38.vcf.gz \
-cosmic /mnt/$ABSOLUTE/PATH/COSMICv83.All.noSNP.vcf \
-inclusion _T_ \
-callers MDKT \
-dedup \
-outfile /mnt/$ABSOLUTE/PATH/63BAMs.MDKT.dedup_all_draft.beta.3.tsv
# Compress them
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 bgzip /mnt/$ABSOLUTE/PATH/63BAMs.sSNV.MSDUKT.dedup_all_draft.beta.3.tsv
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 bgzip /mnt/$ABSOLUTE/PATH/63BAMs.MDKT.dedup_all_draft.beta.3.tsv
Files uploaded to CGC
1) Create VAF files for each tumor purity data out of the 300X SPP data sets
SNV task on CGC:
INDEL task on CGC:
2) Make annotations/flags based on the VAF files created above, using the following command:
# SNV
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/titrationConsistencyTest.py \
-infile /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.vcf.gz \
-vafs /mnt/$ABSOLUTE/PATH/VAF.sSNV.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-0_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sSNV.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_3-1_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sSNV.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-1_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sSNV.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-4_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sSNV.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_0-1_300X.bwa.dedup.txt \
-outfile /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.SPP.vcf
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.SPP.vcf
# INDEL
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/titrationConsistencyTest.py \
-infile /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.vcf.gz \
-vafs /mnt/$ABSOLUTE/PATH/VAF.sINDEL.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-0_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sINDEL.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_3-1_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sINDEL.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-1_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sINDEL.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_1-4_300X.bwa.dedup.txt \
/mnt/$ABSOLUTE/PATH/VAF.sINDEL.wTNscope.201711.02.highConf.draft.beta.3.1_SPP_GT_0-1_300X.bwa.dedup.txt \
-outfile /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.SPP.vcf
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.SPP.vcf
Files
This is the "first" attempt to annotate 4 confidence-levels (StrongEvidence, WeakEvidence, NeutralEvidence, and Likely False Positive), described here.
We will refine this procedure after getting orthogonal validation data.
Commands
# Merge the regions in MajorityAlignerCallable.bed, to create fewer regions to speed up the next two scripts:
mergeBed -i /$ABSOLUTE/PATH/MajorityAlignersCallable.bed > /$ABSOLUTE/PATH/MajorityAlignersCallableMerged.bed
# SNV
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/highConfidenceBuilder_2ndPass.py \
-vcfin /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.dedup_all_draft.beta.3.3.SPP.vcf.gz \
-tsvin /mnt/$ABSOLUTE/PATH/63BAMs.sSNV.wTNscope.201711.02.dedup_all_draft.beta.3.tsv.gz \
-callable /mnt/$ABSOLUTE/PATH/MajorityAlignersCallableMerged.bed \
-exclude /mnt/$ABSOLUTE/PATH/3ArmsLosses.bed \
-outfile /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.highConf.draft.beta.3.3.vcf \
-type snv
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sSNV.MSDUKT.highConf.draft.beta.3.3.vcf
# INDEL
docker run --rm -u $UID -v /:/mnt lethalfang/somaticseq:seqc2 \
/opt/somaticseq/utilities/highConfidenceBuilder_2ndPass.py \
-vcfin /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.dedup_all_draft.beta.3.3.SPP.vcf.gz \
-tsvin /mnt/$ABSOLUTE/PATH/63BAMs.sINDEL.wTNscope.201711.02.dedup_all_draft.beta.3.tsv.gz \
-callable /mnt/$ABSOLUTE/PATH/MajorityAlignersCallableMerged.bed \
-exclude /mnt/$ABSOLUTE/PATH/3ArmLosses.bed \
-outfile /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.highConf.draft.beta.3.3.vcf \
-type indel
docker run --rm -u $UID -v /:/mnt lethalfang/tabix:1.2.1 \
bgzip /mnt/$ABSOLUTE/PATH/sINDEL.MDKT.highConf.draft.beta.3.3.vcf
Pre-final Somatic Mutations Super Set Files are uploaded to CGC
These files above have not been filtered for the 3 arm losses, but variants in them are flagged with "ArmLossInNormal" if they are. Variants outside MajorityAlignersCallable.bed are flagged with NonCallable in FLAGS.
When we have orthogonal validation, we will refine our annotations of StrongEvidence, WeakEvidence, NeutralEvidence, and LikelyFalsePositive.
Since this cell line is established from a female breast cancer, there would be no somatic mutations in chrY. Nevertheless, some of them are called. I did not manually annotate them as false positives even though they would be.
Combine the nine NovaSeq replicates (380X)
Uploaded to CGC, named singleSM.WGS_NS_(N|T)_combine9.(bwa|bowtie|novo).dedup.bam
Somatic Mutation Calling results: CombineNovaSeq.zip
SomaticSeq classifiers used: combinedNovaSeqClassifiers.zip
Take advantage of the Genentech SPP data sets. Convert BWA BAM files to fastq, and then align them with Bowtie2 and NovoAlign (300X)
Convert BAM files to fastq files based on read groups, and then align using bwa, bowtie2, and novoalign
Somatic Mutation Calling results: SPP.300X.zip
SomaticSeq classifiers used: SPP.300X.Classifiers.zip
Recalibrate confidence-levels for low-VAF calls
somaticseq/utilities/recalibrate_baseon_deepSeq.py \
-ref GRCh38/GRCh38.d1.vd1.fa \
-infile sSNV.MSDUKT.superSet.v1.0_rc2.vcf.gz \
-outfile sSNV.MSDUKT.superSet.v1.0.recal.vcf \
--bignova-bwa BigNova.snv.bwa.vcf.gz \
--bignova-bowtie BigNova.snv.bowtie.vcf.gz \
--bignova-novo BigNova.snv.novo.vcf.gz \
--spp-bwa SPP300X.snv.bwa.vcf.gz \
--spp-bowtie SPP300X.snv.bowtie.vcf.gz \
--spp-novo SPP300X.snv.novo.vcf.gz
somaticseq/utilities/recalibrate_baseon_deepSeq.py \
-ref GRCh38/GRCh38.d1.vd1.fa \
-infile sINDEL.MDKT.superSet.v1.0_rc2.vcf.gz \
-outfile sINDEL.MDKT.superSet.v1.0.recal.vcf \
--bignova-bwa BigNova.indel.bwa.vcf.gz \
--bignova-bowtie BigNova.indel.bowtie.vcf.gz \
--bignova-novo BigNova.indel.novo.vcf.gz \
--spp-bwa SPP300X.indel.bwa.vcf.gz \
--spp-bowtie SPP300X.indel.bowtie.vcf.gz \
--spp-novo SPP300X.indel.novo.vcf.gz
Super Set release v1.0
Variants in the truth set are labeled "PASS"
Variants in the truth set only, annotated with SnpEFF and SnpSift (dbSNP v146 and COSMIC v85)