Description and examples of the steps in the ACNV case workflow

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by LeeTL1220

on 2016-04-06

Once you have run GATK CNV, you can run ACNV for revised segments based on both the target-coverage profile and the ref/alt counts at heterozygous SNPs. ACNV will report estimates for the posterior probabilities for copy ratio and minor-allele fraction in each segment.

The ACNV case workflow (description and examples)

Requirements

    1. Java 1.8
    2. A functioning GATK4-protected jar (hellbender-protected.jar or gatk-protected.jar)
    3. Reference genome (fasta files) with fai and dict files. This can be downloaded as part of the GATK resource bundle: http://www.broadinstitute.org/gatk/guide/article?id=1213
    4. Samples must be paired. You will need both a case sample (typically, a tumor) and a control sample (typically, a blood normal). We are working on alleviating this requirement.
    5. A list of common heterozygous SNP sites. Currently, this needs to be in the Picard interval-list format. See http://gatkforums.broadinstitute.org/gatk/discussion/7812/creating-a-list-of-common-snps-for-use-with-getbayesianhetcoverage
    6. A completed run of GATK CNV for the case sample.

Overview of steps

    1. Identify heterozygous SNPs in the normal and aggregate read counts at these sites in the tumor.
    2. Segment the case sample (based on both the read counts from step 1 and input from GATK CNV) and estimate copy ratio and minor-allele fraction in each segment.
    3. Call copy-neutral loss-of-heterozygosity and balanced segments. This step will also create files that can be used as input for ABSOLUTE (Broad-internal versions only) and TITAN.

Step 1. Het Pulldown

* These instructions describe one method for Het Pulldown for matched samples. For more options, including tumor-only, please see: http://gatkforums.broadinstitute.org/gatk/discussion/7719/overview-of-getbayesianhetcoverage-for-heterozygous-snp-calling *

Inputs

    • control_bam -- BAM file for control sample (normal).
    • case_bam -- BAM file for case sample (tumor).
    • reference_sequence -- FASTA file for b37 reference.
    • snp_file -- Picard interval list of common SNP sites at which to test for heterozygosity in the control sample .

Outputs

    • normalhetpulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc., where M is the number of hets called in the control sample.
    • tumorhetpulldown -- TSV file with M entries containing ref/alt counts, ref/alt bases, etc. for sites in the case sample that were called as het in the control sample, where M is the number of hets called in the control sample.

Format for both output files:

CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE READ_DEPTH 1 809876 5 16 A G 21 1 881627 23 12 G A 35 1 882033 9 10 G A 19 1 900505 26 24 G C 50 ....snip....

Invocation

java -jar <path_to_gatk_protected_jar> GetBayesianHetCoverage --reference <reference_sequence> --snpIntervals <snp_file> --tumor <case_bam> --tumorHets <tumor_het_pulldown> --normal <control_bam> --normalHets <normal_het_pulldown> --hetCallingStringency 30

Step 2. Allelic CNV

Inputs

    • tumorhetpulldown -- Generated in step 1.
    • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow.
    • called_segments -- Called-segments TSV file obtained in the GATK CNV case workflow.
    • outputprefix -- Path and file prefix for creating the output files. For example, /home/lichtens/myacnv_output/sample1

Outputs

    • acnvsegments -- TSV file with name ending with -sim-final.seg containing posterior summary statistics for log2 copy ratio and minor-allele fraction in each segment. Using the above outputprefix, /home/lichtens/myacnv_output/sample1-sim-final.seg
    • acnvcrparameters -- TSV file with name ending with -sim-final.cr.param containing posterior summary statistics for global parameters of the copy-ratio model. Using the above outputprefix, /home/lichtens/myacnv_output/sample1-sim-final.cr.param
    • acnvafparameters -- TSV file with name ending with -sim-final.af.param containing posterior summary statistics for global parameters of the allele-fraction model. Using the above outputprefix, /home/lichtens/myacnv_output/sample1-sim-final.af.param

Other files containing intermediate results of the calculation are also generated.

Invocation

java -Xmx8g -jar <path_to_gatk_protected_jar> AllelicCNV --tumorHets <tumor_het_pulldown> --tangentNormalized <coverage_profile> --segments <called_segments> --outputPrefix <output_prefix>

Step 3. Call CNLoH and Balanced Segments

* WARNING: This tool is experimental and exists primarily for internal Broad use. *

Inputs

    • tumorhetpulldown -- Generated in step 1.
    • acnv_segments -- Generated in step 2 (*-sim-final.seg).
    • coverage_profile -- Tangent-normalized coverage TSV file obtained in the GATK CNV case workflow
    • outputdir -- Directory for creating the output files. For example, /home/lichtens/myacnvcnlohcallsoutput/

Outputs

    • GATK-CNV-formatted seg file -- TSV file ending with -sim-final.cnv.seg. This file is formatted identically as the output of GATK CNV. Note that this implies that the allelic fraction values are not captured in this file.
    • AllelicCapSeg-formatted seg file -- TSV file ending with -sim-final.acs.seg. This file is formatted identically as the output of Broad CGA AllelicCapSeg. Note that this file can be used as input to Broad-internal versions of ABSOLUTE.
    • TITAN-compatible het file --TSV file ending with -sim-final.titan.het.tsv. This file can be used as the input to TITAN for the het read counts.
    • TITAN-compatible copy-ratio file -- TSV file ending with -sim-final.titan.tn.tsv. This file can be used as the input to TITAN for the per-target copy-ratio estimates.

Invocation

java -Xmx8g -jar <path_to_gatk_protected_jar> CallCNLoHAndSplits --tumorHets <tumor_het_pulldown> --segments <acnv_segments> --tangentNormalized <coverage_profile> --outputDir <output_dir> --rhoThreshold 0.2 --numIterations 10 --sparkMaster local[*]

Updated on 2016-09-07

From Lizz on 2016-08-30

Hi, if i have the paired (tumor/normal) sample, how to use the normalHets? In the step2, could I use the normalHets file for the arguments —allelicPanelOfNormals,-aPON:File (Input file for allelic-bias panel of normals. Default value: null.)?

From slee on 2016-08-30

Hi @Lizz,

If you use GetHetCoverage or GetBayesianHetCoverage to generate het pulldowns for a large number of normals, you can use these as input to the tool CreateAllelicPanelOfNormals to generate an allelic PoN to use with ACNV. However, the allelic PoN is still under development and the code for this tool was only recently merged into master; thus, if you are using one of the tagged releases, it will not be available.

For the time being, I’d recommend running without the allelic PoN. In this case, the normal het pulldown is not typically used after step 1. However, if you are interested in treating the normal as a tumor case sample, you could pass it as the argument of tumorHet in step 2.

From Lizz on 2016-09-05

thank u!@slee

I have already accomplished the ACNV pipline, and I get the allelicCNV-sim-final.cnv.seg in the step3, not the step2.

and there are still some other diffrences from this description: 1. In step 2, the outputs is : *-MAF.seg, *-no-small.seg, *-sim-final.seg, *-sim-initial.seg, but not the -sim-final.cnv.seg file. 2. In step 3, the outputs is : *-sim-final.acs.seg, *-sim-final.cnb_called.seg, *-sim-final.cnv.seg, *-sim-final.titan.het.tsv, *-sim-final.titan.tn.tsv, not including the file -sim-final.seg. 3. in step 2 and step3, there are have the same file, just for example: /home/lichtens/my_acnv_output/sample1-sim-final.seg , is it resonable?

what’s more, I still have 2 quenstions about the file *-sim-final.acs.seg as input to ABSOLUTE: 1. when I run ABSOLTE the cmdline as follows:

From Lizz on 2016-09-05

sorry about the horrible network!

what’s more, I still have 2 quenstions about the file *-sim-final.acs.seg as input to ABSOLUTE:

1. when I run ABSOLTE the cmdline as follows:(R-3.1.3 >>library(numDeriv) >>library(ABSOLUTE) >>RunAbsolute(“FQ1.allelicCNV-sim-final.acs.seg”, 0, 0.02, 0.95, 10, “PRIMARY_DISEASE”, “Illumina_WES”, “FQ1”, “./”, 1500, 0, 0, “allelic”, maf.fn=NULL, min.mut.af=NULL, output.fn.base=“ofb”, verbose=TRUE)

and the error as : Error: bad restore file magic number (file may be corrupted) — no data loaded In addition: Warning message: file ‘FQ1.allelicCNV-sim-final.acs.seg’ has magic number ‘Chrom‘ Use of save versions prior to 2 is deprecated

the file format is not right? do you have the example file (*-sim-final.acs.seg), and how do you run the ABSOLUTE command? and the format of my -sim-final.acs.seg file is : Chromosome Start.bp End.bp n_probes length n_hets f tau sigma.tau mu.minor sigma.minor mu.major sigma.major SegLabelCNLOH chr1 924630 1298560 11 373930 44 0.5 3.895360886524925 0.017691433016881752 0.961756993262878 0 0.961756993262878 0 2 chr1 1298561 1645376 13 346815 28 0.5 3.8816952962393625 0.030571261954410678 0.956686874399459 0 0.956686874399459 0 2 chr1 1645377 2355508 9 710131 17 0.5 3.9980144929179966 0.022250610170735652 0.9992837019032069 0 0.9992837019032069 0 2

From slee on 2016-09-05

Hi @lizz,

Glad to hear you were able to run ACNV, and thanks for pointing out the differences in this description (which has fallen a little out of date, as we are still actively developing CNV and ACNV—-I will be sure to update it soon, though!) It sounds like the run is completing correctly.

I am not quite sure why the input to ABSOLUTE is not working correctly. Which version of ABSOLUTE are you running? I will double check with some members in Broad CGA (who have been successfully using ACNV as input to ABSOLUTE) to see how they are running things and will get back to you!

From Lizz on 2016-09-06

Hi, @slee ,thank you!

I forgot an important thing that the version of reference genome is hg38, not hg19. And I couldn’t plot the pictures(.png), but the INFO is SUCCESS, why?

the environment that I run ABSOLUTE: 1. ABSOLUTE-1.0.6 (http://www.broadinstitute.org/cancer/cga/absolute_download) 2. R-3.1.3 with the numDeriv / ggplot2 / Rcpp / foreach / DNAcopy / gslibs package , not recommended version of R-2.12.0 3. the input file is descrip above

From slee on 2016-09-06

Hi @lizz,

Thanks for the info—-I’ll look into things. It’s possible that the issue is due to the versions of R you are using.

Also, it may be the case that using hg38 is causing the error in plotting, as plotting only supports hg19 at the moment. That being said, we will try to change the code so that an exception is thrown in this case, instead of a SUCCESS message being returned!

From slee on 2016-09-07

Hi @lizz,

Actually, I’m not sure if the publicly available version of ABSOLUTE-1.0.6 is able to take in either AllelicCapSeg or ACNV output—-it looks like it may only take in HAPSEG output in allelic mode.

The conversion of ACNV output to an ABSOLUTE-compatible AllelicCapSeg file was enabled for Broad CGA users, who are using a version of ABSOLUTE that is (unfortunately) not available to the public at the moment. We will update the post to reflect this—-apologies for the confusion!

From amjad on 2017-01-02

how can I disable spark in AllelicCNV?

From shlee on 2017-01-03

@Lizz,

GenePattern’s [documentation of ABSOLUTE v1.0.6](http://software.broadinstitute.org/cancer/software/genepattern/modules/docs/ABSOLUTE/2) recommends R2.15 with the following packages:

- numDeriv_2012.9-1

- getopt_1.17

- optparse_0.9.5

Please be exact in the version of R and packages you use.

From amjad on 2017-01-09

> @amjad said:

> how can I disable spark in AllelicCNV?

The problem was that I was running too many samples in parallel and spark apparently ran out of ports. It works now after reducing the number of running jobs.

From amjad on 2017-01-10

I am getting an error in AllelicCNV only in few samples:

java.lang.IllegalArgumentException: Initial point in slice sampler is not within specified range. at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:609) at org.broadinstitute.hellbender.utils.mcmc.SliceSampler.sample(SliceSampler.java:71) at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioSamplers$VarianceSampler.sample(CopyRatioSamplers.java:54) at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioSamplers$VarianceSampler.sample(CopyRatioSamplers.java:29) at org.broadinstitute.hellbender.utils.mcmc.ParameterizedModel.doGibbsUpdate(ParameterizedModel.java:134) at org.broadinstitute.hellbender.utils.mcmc.ParameterizedModel.update(ParameterizedModel.java:124) at org.broadinstitute.hellbender.utils.mcmc.GibbsSampler.runMCMC(GibbsSampler.java:76) at org.broadinstitute.hellbender.tools.exome.copyratio.CopyRatioModeller.fitMCMC(CopyRatioModeller.java:96) at org.broadinstitute.hellbender.tools.exome.ACNVModeller.fitModel(ACNVModeller.java:145) at org.broadinstitute.hellbender.tools.exome.ACNVModeller.(ACNVModeller.java:106) at org.broadinstitute.hellbender.tools.exome.AllelicCNV.runPipeline(AllelicCNV.java:280) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:108) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:166) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185) at org.broadinstitute.hellbender.Main.instanceMain(Main.java:76) at org.broadinstitute.hellbender.Main.main(Main.java:92)

From slee on 2017-01-13

Hi @amjad,

That appears to be the result of an edge case in the slice sampling—-should be a quick fix. Can I ask which version of gatk-protected (i.e., commit hash or release number) you are using?

From FPBarthel on 2018-10-17

Hi @LeeTL1220 I successfully used your recent GATK4 [workflow](https://github.com/broadinstitute/gatk/pull/5252 “workflow”) (part of 4.0.10.1 release) to generate `acs.seg` file for some cancer genome data. However using ABSOLUTE 1.0.6, I am unable to import these files using `RunAbsolute()`, which requires and R object as input. Am currently trying to figure out how to convert the `acs.seg` file into a siutable R object format. Do you have any pointers on how to do so?

From LeeTL1220 on 2018-10-18

@FPBarthel Unfortunately, ABSOLUTE is maintained by a different group. I have always had them set it up for me to run and I don’t have access to the source code/configuration. I currently run it through FireCloud (when I run it).

From FPBarthel on 2018-10-18

Thanks @LeeTL1220! After a bit more digging on Google, Github and message boards I came across a somewhat more recent version of ABSOLUTE [here](http://bcb.dfci.harvard.edu/~scarter/clonalevolutionsuite/ “here”) which does seem to support ACS input. It’s been giving a lot of errors but I’ve been able to address some of these with small code fixes and that seems to be fruitful. Is there anyone I can reach out to from the ABSOLUTE group you mention who may be able to give some additional insights?

From petr101 on 2019-01-23

Hi,

in what version of GATK I can find CallCNLoHAndSplits, please? I run successfully a whole workflow using GATK 4.beta.6 and at the last step I found out that CallCNLoHAndSplits is not there. Or is there any alternative to generate a ‘sim-final.acs.seg’ file using this version of GATK?

From jonathanYu on 2019-08-14

Hi,

I need the *-sim-final.acs.seg file as input of deTiN, so that I have to run step 3. However, I came across a problem on the installation of gatk-protected-1.0.0.0-alpha1.2.3. When running ‘./gradlew shadowJar’ , I got an error as below

Downloading https://services.gradle.org/distributions/gradle-2.13-all.zip

Unzipping /home/user/.gradle/wrapper/dists/gradle-2.13-all/1b9wlm73elu4cqnbc0rk7r7qn/gradle-2.13-all.zip to /home/user/.gradle/wrapper/dists/gradle-2.13-all/1b9wlm73elu4cqnbc0rk7r7qn

Set executable permissions for: /home/user/.gradle/wrapper/dists/gradle-2.13-all/1b9wlm73elu4cqnbc0rk7r7qn/gradle-2.13/bin/gradle

…..

Download https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.pom

FAILURE: Build failed with an exception.

    • Where:
    • Build file ‘/public/tools/gatk-protected-1.0.0.0-alpha1.2.3/build.gradle’ line: 112
    • What went wrong:
    • A problem occurred evaluating root project ‘gatk-protected’.
    • > Could not resolve all dependencies for configuration ‘:compile’. > Could not find org.broadinstitute:gatk:4.alpha.1-199-g8377acb-20160713.142718-1. Searched in the following locations: https://repo1.maven.org/maven2/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.pom https://repo1.maven.org/maven2/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.jar https://oss.sonatype.org/content/groups/public/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.pom https://oss.sonatype.org/content/groups/public/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.jar https://repository.cloudera.com/artifactory/cloudera-repos/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.pom https://repository.cloudera.com/artifactory/cloudera-repos/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.jar https://artifactory.broadinstitute.org/artifactory/libs-snapshot/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.pom https://artifactory.broadinstitute.org/artifactory/libs-snapshot/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.jar file:/home/qichao/.m2/repository/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.pom file:/home/qichao/.m2/repository/org/broadinstitute/gatk/4.alpha.1-199-g8377acb-SNAPSHOT/gatk-4.alpha.1-199-g8377acb-20160713.142718-1.jar Required by: :gatk-protected:unspecified
    • Try:
    • Run with —stacktrace option to get the stack trace. Run with —info or —debug option to get more log output.

BUILD FAILED

Can anyone help me to solve this problem?

From slee on 2019-08-16

@jonathanYu I would not recommend using gatk-protected or ACNV; these have been superseded by GATK4 and ModelSegments, which introduce many improvements.

The GATK4-equivalent of the internal scripts mentioned above can be found at https://github.com/broadinstitute/gatk/tree/master/scripts/unsupported/combine_tracks_postprocessing_cnv Note that these scripts are unsupported and use a number of questionable heuristics. However, you may find studying them useful in developing your own scripts.