Tumor-Normal Titration
The CGC project hosting tumor-normal titration data is located here.
Reheader BAM files with multiple SM's
MuTect2 identifies tumor and normal sample by SM tags in the BAM header instead of BAM files itself. Some BAM files needed to be reheadered because of it. These are the CGC apps to do so:
ReheaderBamSingleSM 0-1_200X, ReheaderBamSingleSM 1-0_200X, ReheaderBamSingleSM 3-1_200X, ReheaderBamSingleSM 1-1_200X, ReheaderBamSingleSM 1-4_200X, ReheaderBamSingleSM 1-9_200X, ReheaderBamSingleSM 1-19_200X
ReheaderBamSingleSM 0-1_300X, ReheaderBamSingleSM 1-0_300X, and the remaining BAM files were reheadered locally and uploaded to the CGC project. They can be found under Files with SingleSM prefix and/or SingleSM and SPP tags.
SomaticSeq 5-tool Settings (3-tool for INDEL)
SomaticSeq output is used as input for NeuSomatic's ensemble mode.
Uploaded two BWA-based classifiers that were trained on all of the synthetic data used to build the original gold set, plus the combined NovaSeq data, i.e., same approach as the NeuSomatic training data, with "if_TNscope" excluded from classifier building procedure:
10X
30X
50X
80X
3-1_vs_1-19, 1-4_vs_1-19, and 1-9_vs_1-19 were run locally, e.g.,
100X
SomaticSeq-MSDUK-Consensus - SPP.100X_1.1-1_vs_0-1
MuTect2 did not complete
SomaticSeq-MSDUK-Consensus - SPP.100X_1.1-4_vs_0-1
SomaticSeq-MSDUK-Consensus - SPP.100X_1.1-19_vs_0-1
200X
300X
Ran on local cluster due to running out of fund temporarily
Output files follow the same naming convention as the previous coverages, with tags of "SomaticSeq" and "MSDUK" in them, uploaded at once.
Examples:
SomaticSeq prediction
Then, use the SomaticSeq classifiers specified above to predict mutation status, i.e., score the *.Ensemble.sSNV.tsv and *.Ensemble.sINDEL.tsv files
#!/bin/bash
#$ -o /PATH/Tumor-Normal-Purity/logs
#$ -e /PATH/Tumor-Normal-Purity/logs
#$ -S /bin/bash
#$ -l h_vmem=128G
set -e
for file in /PATH/Tumor-Normal-Purity/*.Ensemble.sSNV.tsv
do
docker run --rm -v /PATH:/PATH -u $UID lethalfang/somaticseq:2.8.1 /opt/somaticseq/r_scripts/ada_model_predictor.R /PATH/Classifiers/GoldSetData.bwa.sSNV.tsv.ntChange.Classifier.RData $file ${file/Ensemble/SomaticSeq}
docker run --rm -v /PATH:/PATH -u $UID lethalfang/somaticseq:2.8.1 /opt/somaticseq/SSeq_tsv2vcf.py -tsv $file -vcf ${file%.Ensemble.sSNV.tsv}.SomaticSeq.sSNV.vcf -pass 0.5 -low 0.1 -all -phred -paired -tools MuTect2 SomaticSniper VarDict MuSE Strelka
done
for file in /PATH/Tumor-Normal-Purity/*.Ensemble.sINDEL.tsv
do
docker run --rm -v /PATH:/PATH -u $UID lethalfang/somaticseq:2.8.1 /opt/somaticseq/r_scripts/ada_model_predictor.R /PATH/Classifiers/GoldSetData.bwa.sINDEL.tsv.ntChange.Classifier.RData $file ${file/Ensemble/SomaticSeq}
docker run --rm -v /PATH:/PATH -u $UID lethalfang/somaticseq:2.8.1 /opt/somaticseq/SSeq_tsv2vcf.py -tsv $file -vcf ${file%.Ensemble.sINDEL.tsv}.SomaticSeq.sINDEL.vcf -pass 0.5 -low 0.1 -all -phred -paired -tools MuTect2 VarDict Strelka
done
NeuSomatic Pre-processing
10X
30X
50X
80X
200X
300X