created by Geraldine_VdAuwera
on 2018-01-07
Identify somatic copy number variant (CNVs) in a case sample. Requires an appropriate Panel of Normals (PON).
| Pipeline | Summary | Notes | Github | Terra | |:-------------|:---------------|:---------|:-----------|:--------------| | Somatic CNV case sample | Case BAM to CNV | universal | yes | b37 | | Somatic CNV PON creation | Normal BAMs to PON | universal | yes | b37 |
Documentation for these workflows is in development.
Updated on 2019-07-14
From dayzcool on 2018-02-06
I would like to try GATK’s Somatic CNV on exome and whole genome samples.
I noticed that there are workflows in the ‘[placeholder](https://github.com/gatk-workflows/gatk4-somatic-cnvs “placeholder”)’, and gatk source repo. also includes cnv workflows updated at later date.
Would you advise if those are ready to be used and which one is a better choice? Thank you!
From shlee on 2018-02-06
Hi @dayzcool,
It’s my understanding the gatk-workflows repository (the linked URL for ‘placeholder’) is meant to illustrate the use of fixed versions of the gatk source repo scripts with filled-out example inputs JSON files. The gatk-workflows/gatk4-somatic-cnvs repo is being worked on as we speak—as you can see it is missing example inputs JSON files.
Differences between these repo scripts are meant to be minimal. The differences you see now are because this particular script (not to mention the tools) is fairly new and still undergoing further tweaks based on recent tests.
Currently, for an advanced user as yourself, perhaps the best source of information on the Somatic CNV workflow is the WDL scripts in the Github repository at https://github.com/broadinstitute/gatk/tree/master/scripts. These are updated by the developers concurrently with tool version updates and offer additional supporting WDL scripts as well as unsupported WDL scripts that may be of interest and that are not in a gatk-workflows repo, e.g. a script for Mutect2 panel of normals creation.
The same somatic CNV WDL script applies to either exome or whole genome data. You need only tweak the `—bin-length` value to be appropriate for the type of data, e.g. default 1000 for genomes or 0 for exomes. I have just started to prepare to write the Somatic CNV workflow tutorial so this and other supporting documents should become available on the forum ~ in a month or two.
From dayzcool on 2018-02-06
@shlee, I really appreciate for your kind explanations and letting me know the —bin-length argument.
Too bad I can’t wait for the documentation, but I am looking forward to reading it in couple months!
From eric_wu on 2018-02-07
@shlee , I have tried GATK-CNV in beta mode before the GATK 4 launched.
now I want to redo my analysis, should I still follow the instruction from those links?
1. your post- (How to) Call somatic copy number variants using GATK4 CNV:
https://gatkforums.broadinstitute.org/gatk/discussion/9143/how-to-call-somatic-copy-number-variants-
using-gatk4-cnv
2. pdf file (Call somatic copy number variants using GATK CNV):
http://genomicinfo.broadinstitute.org/acton/attachment/13431/f-012e/1/-/-/-/-/Somatic_CNV_handon_worksheet.pdf?sid=TV2:QKwcswckd
Is there any new command or process should add if I use GATK4?
my work flow :
1.padding target_list
2.prepare proportional coverage
3.prepare PON
4.normolize with PON
5.perform segment
6.plot segment
7.call segment
B.T.W, the final tumor.called from call segment step, does its Segment_Mean column contain the value
of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2
CopyNumber does? because I am now working on several tools evaluation experiment, I need to figure
out this. Really thanks for performing such powerful tools in CNV calling!.
From shlee on 2018-02-07
Hi @eric_wu,
The tools have changed in major ways since GATK4.beta.6, starting with tool names in GATK4.0.0.0. These changes were merged into master for GATK4.0.0.0, released in January. If you are using workflows for any beta release, the tutorials you list apply. If you are using GATK4.0.0.0, then the tutorials are currently in the works. The previous tutorials do still apply conceptually and in the major data transformation steps. It’s just that details (tool features, tool names, parameter names, underlying algorithms) have changed, e.g. incorporating matched normal information. You can refer to the WDL scripts and tool documents for now for the GATK4.0.0.0 workflow.
From eric_wu on 2018-02-08
Hi@shlee,
Thanks for reply! then I will use the beta5 CNV workflow first!.
and my another question :
the final tumor.called from call segment step, does its Segment_Mean column contain the value
of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2
CopyNumber does?
I check the values, it seems that more like the raw relative ratio than log2 transform cause no negative
sign in front of the values, Do I have the same interpretation with you? Thank you!.
From eric_wu on 2018-02-08
sorry, real post is at the bottom one.
From eric_wu on 2018-02-08
Hi @shlee ,
Thanks for reply! then I will use the beta5 CNV workflow first!.
and my another question :
the final tumor.called from call segment step, does its Segment_Mean column contain the value of the raw coverage ratio(Tumor COV/ normal COV) or the log2 transform ratio just like VarScan2 CopyNumber does? I check the values, it seems that more like the raw relative ratio than log2 transform cause no negative sign in front of the values, Do I have the same interpretation with you? Thank you!.
From shlee on 2018-02-08
Hi @eric_wu,
Let me have @LeeTL1220 jump in here.
From LeeTL1220 on 2018-02-08
@eric_wu I believe it is not the log2 transform.
From eric_wu on 2018-02-09
Hi shlee &
LeeTL1220,
Thanks for reply! yet another pop out when I check the final called file and the .ptn & .tn file.
I found that the column store the coverage, which usually the last column, use the normalize sample
name as the sample name in the final tumor called file.
e.g. my PON name is foo, my five tumor sample name is fooA, fooB, fooC, fooD, fooE
It turns out that in each tumor.called file, all the tumor samples are named as foo, but not fooA~fooE
I not sure if it is ok to report this bug here, or maybe I should report in GATK4 git?
Many Thanks!
From Sheila on 2018-02-14
@eric_wu
Hi,
Sorry for the delay. I asked someone from the team to get back to you soon.
-Sheila
From LeeTL1220 on 2018-02-14
@eric_wu If you can, please report in the GATK4 repo. Thanks.
From bulitsky on 2018-03-29
HI, @shlee ,
Can I run GATK4 CNV on a somatic sample WITHOUT matched normal sample ( I have PoN created with some normal samples) ?
Thank you!
From shlee on 2018-03-30
Absolutely @bulitsky, you can run a tumor sample through somatic CNV without a matched normal.
From afzm on 2018-12-03
Is it possible to run those .wdl scripts without Docker, I cannot use it in my server, (I mean is there a workaround?). Thank you very much.
From shlee on 2018-12-03
Hi @afzm,
Yes, you can run WDL scripts without Docker. Just remove the runtime section. See the latest [WDL & Cromwell basics hands-on worksheet](https://drive.google.com/open?id=14t9DQXuzhuSWwO7_AZSHkr28mviCaMjz) for example WDLs that do not call on a Docker and also for a brief explanation of the runtime section (section 7).