created by esalinas
on 2017-06-21
I ran GATK4 AllelicCNV program and get an error message saying “Segment Mean must be finite.” (see below)
I look at the input files I give the program and no values were listed as “Inf” or “Infinity” or “-Inf” or anything like that.
All the values I saw (sorting numerically on the column a la “sort g” ) showed values as low as -8 or so and as high as 8 or so with most values near 0. Because I saw values in the range +/ 8 I’m confused as to the source of the error. I’m not sure if this is a bug or a known issue?
-eddie
```
14:59:47.184 INFO AllelicCNV – Shutting down engine
[June 20, 2017 2:59:47 PM UTC] org.broadinstitute.hellbender.tools.exome.AllelicCNV done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=910688256
java.lang.IllegalArgumentException: Segment Mean must be finite. at org.broadinstitute.hellbender.utils.param.ParamUtils.isFinite(ParamUtils.java:180) at org.broadinstitute.hellbender.tools.exome.ModeledSegment.(ModeledSegment.java:22) at org.broadinstitute.hellbender.tools.exome.SegmentUtils.toModeledSegment(SegmentUtils.java:548) at org.broadinstitute.hellbender.tools.exome.SegmentUtils.lambda$readModeledSegmentsFromSegmentFile$721(SegmentUtils.java:185) at org.broadinstitute.hellbender.utils.tsv.TableUtils$1.createRecord(TableUtils.java:112) at org.broadinstitute.hellbender.utils.tsv.TableReader.fetchNextRecord(TableReader.java:355) at org.broadinstitute.hellbender.utils.tsv.TableReader.access$200(TableReader.java:94) at org.broadinstitute.hellbender.utils.tsv.TableReader$1.hasNext(TableReader.java:458) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.tools.exome.SegmentUtils.readSegmentFile(SegmentUtils.java:421) at org.broadinstitute.hellbender.tools.exome.SegmentUtils.readModeledSegmentsFromSegmentFile(SegmentUtils.java:184) at org.broadinstitute.hellbender.tools.exome.AllelicCNV.runPipeline(AllelicCNV.java:290) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143) at org.broadinstitute.hellbender.Main.main(Main.java:221)
esalinas@eddie-instance-1:~/fc-9c84e685-79f8-4d84-9e52-640943257a9b/ef545f92-5053-4087-ad54-09fa884d0494/wgs_cnv_work/f2df0580-2445-4ea3-af41-d5a8a3107f5b$
```
From Geraldine_VdAuwera on 2017-06-21
It might be a bug — what version are you using?
From esalinas on 2017-06-21
@Geraldine_VdAuwera I belive this is the JAR gatk-package-4.alpha.2-1136-gc18e780-SNAPSHOT-local.jar so the committ looks to be “gc18e780”
From slee on 2017-06-22
@esalinas The segment means in the .seg file should be non log_2 (this is the output of the GATK CNV tool, as indicated in the javadoc), but the tangent normalized coverages in the tn.tsv should be log_2.
@shlee Perhaps it’s worth making this more explicit in the javadoc.
Future versions of the CNV pipeline will only deal with raw (integer) read-count coverage and only output non log_2 copy-ratio estimates, which will hopefully prevent such sources of confusion.
From esalinas on 2017-06-23
@slee thanks for this info that tells about the expected log-2-space-status. It would seem like any data file with a “Mean” or “Average” column with fractional values, particularly that are negative, I would guess are log2. A file with such values that are non-negative integer or have only zeroes after a decimal point I would guess are non-log-2. I can write a script that transforms a file to take log-2 or do 2^x.
From shlee on 2017-06-23
@slee, I’ve updated the docs.