created by GATK_Team
on 2017-12-29
In general you should use FireCloud, which has all the major GATK workflows preloaded, is more scalable and makes it easier to share any work you do with external collaborators, since the portal is publicly accessible and you can grant anyone access to workspaces securely and conveniently.
However, there are a couple of few Broad-internal resources that you can use if FireCloud is not yet a suitable option for you.
The following dotkits should load all the necessary dependencies:
use .hdfview-2.9 use Java-1.8 use .r-3.1.3-gatk-only
If these don't work, move to a VM where the dotkits are not broken. If that still doesn't work, go to FireCloud.
We make this available as a courtesy, but we will not be able to provide support for any Firehose-specific aspects. Note that Firehose will be phased out at some point in 2018, and you will need to move your work to FireCloud by then. Rest assured we will provide support for the migration (phase-out calendar TBD).
We have put the GATK4 Somatic CNV Toolchain into Firehose. Please copy the below workflows from Algorithm_Commons
:
GATK_Somatic_CNV_Toolchain_Capture GATK_Somatic_CNV_Toolchain_WGS
Who do I contact with an issue?
First, make sure that your question is not here or in another forum post. If it is a Firehose issue or you are not sure, email pipeline-help@broadinstitute.org
. If you are sure that it is an issue with GATK CNV, ACNV, or GetBayesianHetPulldown, post to the forum.
What is GATK CNV vs. ACNV and which are run in the workflows above?
Are the results (e.g. sensitivity and precision) better than ReCapSeg in the GATK CNV toolchain?
If you talk about running without the allelic integration, then the results are equivalent. If you want more details, ask in the forum or invite us to talk to you -- we have a presentation or two about this topic.
Do I run these workflows on Pair Sets or Individual Sets?
Individual Sets
What entity types do the tasks run on?
Samples and Pairs. I realize that the above question says to run the workflow on Individual Sets. This is to work around a Firehose issue.
What are the caveats around WGS?
What is the future of ReCapSeg?
We are phasing out ReCapSeg, for many reasons, everywhere -- not just Firehose. If you would like more details, post to the forum and we'll respond.
What is the future of Allelic CapSeg?
We have never supported (and never will support) Allelic CapSeg and cannot answer that question. We have some results comparing Allelic CapSeg and GATK ACNV. We can show you if you are interested (internal to Broad only).
Why are there fewer plots than in ReCapSeg?
We did not include plots that we did not believe were being used. If you would like to include additional plots, please post to the forum.
How is the GATK 4 CNV toolchain workflow better than the ReCapSeg workflow?
Are there new PoNs for these workflows?
Yes, but the PoN locations are already populated, if you run the workflows properly. You should not need to do any set up yourself.
Is the correct PoN automatically selected for ICE vs. Agilent samples?
Yes, if you run the workflow as provided.
Is there a PoN creation workflow in Firehose?
No. Never going to happen. Don't ask. See the forum for instructions to create PoNs.
Can I run ABSOLUTE from the output of GATK ACNV?
Yes. The annotations are gatk4cnv_acnv_acs_seg_file_capture
(capture) and gatk4cnv_acnv_acs_seg_file_wgs
(WGS).
Can I run TITAN from the output of GATK ACNV?
Yes, though there has been little testing done on this. The annotations are gatk4cnv_acnv_acs_seg_file_capture
and gatk4cnv_acnv_acs_seg_file_wgs
.
Do the workflows above include Oncotator gene lists?
Yes.
These workflows include Picard Target Mapper. Isn't that going to cause me to have to rerun all of my jobs (e.g. MuTect)?
The workflows above will rerun Picard Target Mapper, but only new annotations are added. All previous output annotations of Picard Target Mapper should be populated with the same values. This will look as if it outdated mutation calling (MuTect) and other tasks, but the rerunning will be job-avoided.
Can I do the tumor-only GATK ACNV workflow?
For exome that is working well, but is not available in Firehose. If you would like to see evaluation data for tumor-only on exome, we can show you (internal to Broad only).
What are all of the annotations produced?
Where applicable, each of the list below also has a *_wgs
counterpart... Sample annotations:
Pair annotations:
Do the workflows also run on the normals?
GATK CNV, yes.
GATK ACNV, no. There is a het pulldown generated for the normal, as a side effect, when doing the het pulldown for the tumor.
What about array data?
The GATK4 CNV tools do not run on array data. Sequencing data only.
Do we still need separate PoNs if we want to run on X and Y?
Yes.
Can I run both the ReCapSeg workflow and the GATK CNV toolchain workflow?
Yes. All results are written to separate annotations.
Are the new workflows part of my PrAn?
No, not yet. You will need to copy (and run) these manually from Algorithm_Commons
before you begin analysis. As a reminder, copy into your analysis workspace.
Does GATK CNV require matched (tumor-normal) samples?
No.
Does GATK ACNV require matched (tumor-normal) samples?
In Firehose, yes. Out of Firehose, no.
How do I modify the ABSOLUTE tasks in FH to accept the new GATK ACNV annotations?
There are two changes you need to make to the ABSOLUTEv1.5WES configuration to make it accept the new outputs.
From eminikel on 2018-11-07
If I am working on premises on the Broad cluster, where can I access GenomeAnalysisTK.jar and not need to have my own installation?