Errors about contigs in BAM or VCF files not being properly ordered or sorted

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by Geraldine_VdAuwera

on 2012-08-11

This is not as common as the "wrong reference build" problem, but it still pops up every now and then: a collaborator gives you a BAM or VCF file that's derived from the correct reference, but for whatever reason the contigs are not sorted in the same order. The GATK can be particular about the ordering BAM and VCF files so it will fail with an error in this case.

So what do you do?

For BAM files

You run Picard's ReorderSam tool on your BAM file, using the reference genome dictionary as a template, like this:

java -jar picard.jar ReorderSam \ I=original.bam \ O=reordered.bam \ R=reference.fasta \ CREATE_INDEX=TRUE

Where reference.fasta is your genome reference, which must be accompanied by a valid *.dict dictionary file. The CREATE_INDEX argument is optional but useful if you plan to use the resulting file directly with GATK (otherwise you'll need to run another tool to create an index).

Be aware that this tool will drop reads that don't have equivalent contigs in the new reference (potentially bad or not, depending on what you want). If contigs have the same name in the BAM and the new reference, this tool assumes that the alignment of the read in the new BAM is the same. This is not a liftover tool!

For VCF files

You run Picard's SortVcf tool on your VCF file, using the reference genome dictionary as a template, like this:

java -jar picard.jar SortVcf \ I=original.vcf \ O=sorted.vcf \ SEQUENCE_DICTIONARY=reference.dict

Where reference.dict is the sequence dictionary of your genome reference.

Note that you may need to delete the index file that gets created automatically for your new VCF by the Picard tool. GATK will automatically regenerate an index file for your VCF.

Version-specific alert for GATK 3.5

In version 3.5, we added some beefed-up VCF sequence dictionary validation. Unfortunately, as a side effect of the additional checks, some users have experienced an error that starts with "ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant." that is due to unintentional activation of a check that is not necessary. This will be fixed in the next release; in the meantime -U ALLOWSEQDICT_INCOMPATIBILITY can be used (with caution) to override the check.

Updated on 2016-02-17

From rdali094 on 2016-01-10

Hello,

Is there anyway to adjust/sort the genome fasta file instead of the Bam file?

I have hundreds of Bams taking up TBs of storage. To reorder them all and save double copies (since some information is lost in the process) is a huge endeavour. It is much more feasible for me to tweak the reference instead. Is that possible?

Thanks!

Rola

From Geraldine_VdAuwera on 2016-01-10

Yes, you could reorder the contigs within the reference file — but if you use the same reference that was used to generate all these bams you shouldn’t need to. What exactly is the error you’re getting?

From rdali094 on 2016-01-10

It might not be the exact reference genome file. The data is publicly available and it is aligned to hg19 but I dont have the exact genome file. What should I do? Do I look at the order of contigs in the Bam file and order the genome file the exact same way?

This is the error log:

INFO 23:29:03,978 GenomeAnalysisEngine - Strictness is SILENT INFO 23:29:04,344 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 23:29:04,365 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 23:29:04,748 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.38 INFO 23:29:04,972 IntervalUtils - Processing 159138663 bp from intervals INFO 23:29:08,711 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions http://www.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads. Please see http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersamfor more information. Error details: reads contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]

From rdali094 on 2016-01-11

I tried to order the contigs in the genome fasta file the same way the Bams are sorted:

pulled out contig order:

samtools view $BAM | cut -f3 | uniq

chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrM chrX chrY *

recreate genome file with same contigs in the same order:

cat chr1.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr2.fa chr20.fa chr21.fa chr22.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chrM.fa chrX.fa chrY.fa > hg19.gatk.fa

index genome and create dictionary:

samtools faidx hg19.gatk.fa

java -jar picard.jar CreateSequenceDictionary R= hg19.gatk.fa O= hg19.gatk.dict

run muTect:

java -jar GenomeAnalysisTK.jar -T MuTect2 -R hg19.gatk.fa -I:tumor $tumor -I:normal $normal --intervals "chr7" -o NormalTumorPair_chr7.muTect.vcf

######### Same error:

INFO 20:20:08,335 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:20:08,339 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 INFO 20:20:08,339 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 20:20:08,339 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 20:20:08,344 HelpFormatter - Program Args: -T MuTect2 -R hg19.gatk.fa -I:tumor tumor.bam -I:normal normal.bam --intervals chr7 -o NormalTumorPairchr7.muTect.vcf INFO 20:20:08,361 HelpFormatter - Executing as x on Linux 2.6.32-504.30.3.el6.x8664 amd64; OpenJDK 64-Bit Server VM 1.7.085-mockbuild201507151257-b00. INFO 20:20:08,361 HelpFormatter - Date/Time: 2016/01/10 20:20:08 INFO 20:20:08,361 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:20:08,361 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:20:08,595 GenomeAnalysisEngine - Strictness is SILENT INFO 20:20:08,980 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 INFO 20:20:08,996 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 20:20:09,142 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.14 INFO 20:20:09,232 IntervalUtils - Processing 159138663 bp from intervals INFO 20:20:10,879 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions http://www.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR ------------------------------------------------------------------------------------------

From Geraldine_VdAuwera on 2016-01-11

Ah, what you’re encountering is not the “contigs mismatch” error, it’s a different error that complains specifically about the lexicographical order. We’ll add some text to disambiguate that in the doc.

This is something of a legacy requirement that contigs be ordered in pure alphanumeric order. Once you’ve made sure the dictionaries match between your bams and reference, and any VCF resources you plan to use, you can override this requirement by setting `-U ALLOW_SEQ_DICT_INCOMPATIBILITY` which will also allow lexicographical order.

Note that `-U` stands for `UNSAFE` and that skipping dictionary validation is not recommended. However in your case I think it may be the only way to work without reordering all your bams.

Be sure to test this on a full run of your pipeline on a subset of data to make sure it won’t cause any fatal problems downstream.

From breardon on 2016-01-22

I am currently trying to resort contigs from a vcf relative to a reference with the picard task SortVCF; however, I am running into the error quoted below. This may be because of an incompatible dictionary but the error was repeated even after creating a new dict file by following the documentation here: http://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference.

Do you have any suggestions? I am using the reference genome as specified at the top of the vcf.

Thank you!

—

[Fri Jan 22 16:50:24 EST 2016] picard.vcf.SortVcf INPUT=[ALL.autosomes.phase3_shapeit2_mvncall_integrated_v4.20130502.sites.vcf] OUTPUT=sorted.vcf SEQUENCE_DICTIONARY=/xchip/cga_home/breardon/misc_projects/_travis_vcf/reference_hs37d5/hs37d5.dict VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=true CREATE_MD5_FILE=false

[Fri Jan 22 16:50:24 EST 2016] Executing as breardon@cga02 on Linux 2.6.32-573.7.1.el6.x86_64 amd64; Java HotSpot™ 64-Bit Server VM 1.7.0_71-b14; Picard version: 1.834(b2a94f76e786204c2ea48814b4a4a1018ff9b338_1422570082) JdkDeflater

[Fri Jan 22 16:50:25 EST 2016] picard.vcf.SortVcf done. Elapsed time: 0.03 minutes.

Runtime.totalMemory()=2058354688

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

Exception in thread “main” java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=GL000191.1,length=106433,dict_index=22,assembly=b37) was found when SAMSequenceRecord(name=X,length=155270560,dict_index=22,assembly=null) was expected.

at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:112)

at picard.vcf.SortVcf.doWork(SortVcf.java:81)

at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)

at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)

at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=GL000191.1,length=106433,dict_index=22,assembly=b37) was found when SAMSequenceRecord(name=X,length=155270560,dict_index=22,assembly=null) was expected.

at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:142)

at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:110)

… 4 more

From Sheila on 2016-01-26

@breardon

Hi,

I am going to have our Picard expert look at this. He will get back to you.

Sheila

From dekling on 2016-01-26

@breardon, did you run the UpdateVcfSequenceDictionaryTool prior to running the SortVcf tool?

From Magda on 2016-02-02

Hi, I'm starting to get crazy.

I run HaplotypeCaller without a dbsnp.vcf file and it worked well. java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa -I GATKBaseRecalibrator/Ind1Recal.bam -o GATKHaploCaller/Ind1rawg.vcf -L nexterarapidcaptureexometargetedregionsv1.2.bed -ERC GVCF
I wanted to include a dbsnp.vcf file and the problems started. java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa --dbsnp dbsnp138.hg19.vcf -I GATKBaseRecalibrator/Ind1Recal.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregionsv1.2.bed -ERC GVCF

ERROR MESSAGE: Input files dbsnp and reference have incompatible contigs: The contig order in dbsnp and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

Ok, So, I order it: java -jar picard.jar SortVcf I=.dbsnp138.hg19.vcf O=dbsnp138.hg19sorted.vcf SEQUENCEDICTIONARY=.ucsc.hg19.dict.gz
So, I try again with the sorted vcf file: java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg19.fa --dbsnp dbsnp138.hg19sorted.vcf -I GATKBaseRecalibrator/Ind1Recal.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregions_v1.2.bed -ERC GVCF

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in dbsnp.

ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.

ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.

ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam

Since it proposes to reorder the bam file, I do it: java -Xmx6g -jar picard.jar ReorderSam I= GATKBaseRecalibrator/Ind1Recal.bam O= GATKBaseRecalibrator/Ind1Recalreordered.bam R= ucsc.hg19.fasta CREATEINDEX=TRUE
So, now, using this bam file reordered, I try again.
1. java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta --dbsnp dbsnp138.hg19sorted.vcf -I GATKBaseRecalibrator/Ind1Recalreordered.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregionsv1.2.bed -ERC GVCF
2. it doesn't work. I'll try to use the original dbsnp.vcf

java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta --dbsnp dbsnp138.hg19.vcf -I GATKBaseRecalibrator/Ind1Recalreordered.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregions_v1.2.bed -ERC GVCF

YES!!!!!!! YES!!!!!!! YES!!!!!!! YES!!!!!!! YES!!!!!!! it seems to work.....

But after 2h just before finishing it appears this error:

ERROR MESSAGE: File /Users/...GATKBaseRecalibrator/Ind1Recal_reordered.bai is malformed: Premature end-of-file while reading BAM index file

GATKBaseRecalibrator/Ind1Recal_reordered.bai. It's likely that this file is truncated or corrupt -- Please try re-indexing the corresponding BAM file.

..... Well, I reindexed, re run, try that, and that and have been the last 2 days trying all combinations of vcf, reference, bam files, original, reordered...

I do not what can I do. Magda

From Sheila on 2016-02-02

@Magda

Hi Magda,

Where did you get the reference and dbsnp file you are using? We have a [bundle](https://www.broadinstitute.org/gatk/guide/article?id=1213) with files you should be able to use with no problem. You can also try adding `-U ALLOW_SEQ_DICT_INCOMPATIBILITY` as Geraldine suggested above.

-Sheila

From Geraldine_VdAuwera on 2016-02-02

Premature end of file means the file is truncated and will not be usable. Whatever process was used to generate it must be repeated.

From Geraldine_VdAuwera on 2016-02-02

Just realized my comment wasn’t helpful — but what happened after you re-indexed the file? You don’t say if there is still an error or what it is.

From Magda on 2016-02-03

Hi, Thanks for answering so quick.

1) to Sheila, I tried to use the -U ALLOW QESDICT_INCOMPATIBILITY and although it runs it gives a warn:

java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta --dbsnp dbsnp138.hg19sorted.vcf -I GATKBaseRecalibrator/Ind1Recalreordered.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregionsv1.2.bed -ERC GVCF

WARN 11:29:52,095 SequenceDictionaryUtils - Input files /dbsnp138.hg19sorted.vcf and reference have incompatible contigs: The contig order in /dbsnp138.hg19sorted.vcf and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files.. dbsnp138.hg19sorted.vcf contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY] reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]

It's weird, because precisely in this case I downloaded the reference (ucsc.hg19.fasta), the dbsnp (dbsnp138.hg19sorted.vcf ) from the bundle and the bam files were reordered using this reference from the bundle too (R= ../../ucsc.hg19.fasta).

2) To Geraldine, When I reindexed: java -jar -Xmx4g picard.jar BuildBamIndex I= GATKBaseRecalibrator/Ind1Recalreordered.bam O= GATKBaseRecalibrator/Ind1Recalreorderedreindex.bai VALIDATIONSTRINGENCY=LENIENT

I tried to rerun HaplotypeCaller: Ijava -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta --dbsnp dbsnp138.hg19.vcf -I GATKBaseRecalibrator/Ind1Recalreordered.bam -o GATKHaploCaller/Ind1raw.g2.vcf -L nexterarapidcaptureexometargetedregions_v1.2.bed -ERC GVCF

ERROR MESSAGE: GVCF output requires a specific indexing strategy. Please re-run including the arguments -variantindextype LINEAR -variantindexparameter 128000.

I think I'm just going to desist of using a dbsnp file. Thank you

From Magda on 2016-02-03

Hi,

I try a last thing and it worked well. The only different thing is that now I run it on a Test directory where there were only the original Ind1.bam file, the Ind1_reordered.bam and Ind1_reordered.bai (and not the Ind1.bai file). So I think that the old indexed bam file even though the name is different interfere with the command.

java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta —dbsnp dbsnp_138.hg19.vcf -I GATK_BaseRecalibrator/Test/Ind1_Recal_reordered.bam -o GATK_BaseRecalibrator/Test/Ind1_raw.g.vcf -L nexterarapidcapture_exome_targetedregions_v1.2.bed -ERC GVCF

Thank you.

Magda

From Magda on 2016-02-04

Hi again. How is it possible that for the next step I have the same complains?

Yesterday I run HaplotypeCaller after reordering the Bam files using the ref from bundle, then using the dnsnp from bundle and bam files 1. ReorderBam using picard on a separate directory without any previous .idx file in the directory (otherwise it doesn't work) do java -Xmx6g -jar picard.jar ReorderSam I= GATKBaseRecalibrator/ReorderedBam/Ind1Recal.bam O= GATKBaseRecalibrator/ReorderedBam/Ind1Recalreordered.bam R= ucsc.hg19.fasta CREATE_INDEX=TRUE

Run HaplotypeCaller using the ref, dnsnp from bundle and reordered bam files java -Xmx6g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta --dbsnp dbsnp138.hg19.vcf -I GATKBaseRecalibrator/ReorderedBam/Ind1Recalreordered.bam -o GATKHaploCaller/Ind1raw.g.vcf -L nexterarapidcaptureexometargetedregionsv1.2.bed -ERC GVCF

Today. I want to use GenotypeGVCFs, so

java -Xmx6g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta -D 1000Gphase1.snps.highconfidence.hg19.sites.vcf -V Ind1raw.g.vcf -V Ind24raw.g.vcf -o Output_Joint.vcf
1. ERROR MESSAGE: Input files dbsnp and reference have incompatible contigs: The contig order in dbsnp and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ERROR dbsnp contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16,

ERROR reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15,

Ok, I think of sorting the snps.vcf file

java -jar picard.jar SortVcf I=1000Gphase1.snps.highconfidence.hg19.sites.vcf O=1000Gphase1.snps.highconfidence.hg19.sitessorted.vcf SEQUENCEDICTIONARY=ucsc.hg19.dict

Try again:

java -Xmx6g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta -D 1000Gphase1.snps.highconfidence.hg19.sitessorted.vcf -V Ind1raw.g.vcf -V Ind24raw.g.vcf -o OutputJoint.vcf

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in dbsnp.

ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.

ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.

ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam

ERROR dbsnp contigs = [chr1, chr10, chr11, chr11gl000202random, chr12, chr13, chr14, chr15, chr16, chr17, chr17ctg5hap1, chr17gl000203random, chr17gl000204random, chr17gl000205random, chr17gl000206random, ch

OK; Should I also sort again the raw.g.vcf files? I do a test

java -jar picard.jar SortVcf I=ReorderedBam/Ind1raw.g.vcf O=ReorderedBam/Ind1rawsorted.g.vcf SEQUENCEDICTIONARY=ucsc.hg19.dict java -jar picard.jar SortVcf I=ReorderedBam/Ind24raw.g.vcf O=ReorderedBam/Ind24rawsorted.g.vcf SEQUENCEDICTIONARY=ucsc.hg19.dict

Try again

java -Xmx6g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R ucsc.hg19.fasta -D 1000Gphase1.snps.highconfidence.hg19.sitessorted.vcf -V ReorderedBam/Ind1rawsorted.g.vcf -V ReorderedBam/Ind24rawsorted.g.vcf -o OutputJoint.vcf

ERROR MESSAGE: Lexicographically sorted human genome sequence detected in variant.

ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.

ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.

ERROR You can use the ReorderSam utility to fix this problem: http://gatkforums.broadinstitute.org/discussion/58/companion-utilities-reordersam

ERROR variant contigs = [chr1, chr10, chr11, chr11gl000202random, chr12, chr13, chr14, chr15, chr16, chr17,

I just don't understand that if I'm using the ucsc.hg19.fasta file from bundle, the SNPs.vfc from bundle, and I reordered my bam files using that same ref file, it still complains in the nest step.

I am sorry, but I just don't get it.

Magda

From Geraldine_VdAuwera on 2016-02-04

Hi Magda,

The “Lexicographically sorted human genome sequence” error is our fault — we added a check that is not necessary, by mistake. We are going to fix this in the next version. For now, if you rerun your last command with `-U ALLOW_SEQ_DICT_INCOMPATIBILITY` it should work.

From Magda on 2016-02-04

Ok, it’s working. Thank you very much

Magda

From artitandon on 2016-04-11

I downloaded the file dbsnp_138.hg19.vcf from the resource bundle, and I got an error when running BQSQ step using this file as follows:

ERROR MESSAGE: Input files /home/p2010-217-gpfs/Arti/RNASeqFFPE/Software/dbsnp138.hg19.vcf and reference have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information. Error details: The contig order in /home/dbsnp138.hg19.vcf and referenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ERROR /home/dbsnp_138.hg19.vcf contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]

ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]

ERROR ------------------------------------------------------------------------------------------

To fix this I ran SortVcf, and now get the following error, which makes no sense since isn't it supposed to reorder it?: Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrM,length=16571,dictindex=0,assembly=hg19) was found when SAMSequenceRecord(name=chr1,length=249250621,dictindex=0,assembly=null) was expected. at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:112) at picard.vcf.SortVcf.doWork(SortVcf.java:81) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105) Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=chrM,length=16571,dictindex=0,assembly=hg19) was found when SAMSequenceRecord(name=chr1,length=249250621,dictindex=0,assembly=null) was expected. at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:165) at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:110) ... 4 more

Any help with this will be appreciated, thanks!

From Sheila on 2016-04-12

@artitandon

Hi,

Are you using the latest version of Picard? Can you tell us the exact command you ran?

Thanks,

Sheila

From Sheila on 2016-04-13

@artitandon

Hi again,

You may also try running Picard’s [UpdateVcfSequenceDictionary](http://broadinstitute.github.io/picard/command-line-overview.html#UpdateVcfSequenceDictionary).

-Sheila

From artitandon on 2016-04-13

Hi Sheila, I am using picard-tools-1.140 version, and the exact command is : java -jar picard-tools-1.140/picard.jar SortVcf I=dbsnp_138.hg19.vcf O=dbsnp_138.hg19.sort.vcf SEQUENCE_DICTIONARY=hg19.dict

Thanks,

Arti

From Sheila on 2016-04-14

@artitandon

Hi Arti,

Okay. Can you confirm you get the same error with the latest version of Picard? Also, please try running Picard’s UpdateVcfSequenceDictionary as I pointed you to above.

Thanks,

Sheila

From sumedhagarg on 2016-07-05

Hello I finally managed to have my reference files in place and tried to run GATK tools, but got the error about contigs and reads not being in same order. I can see my sample file has much simpler and more ordered contigs. Can you please advise how can I extract selected contigs from the reference hg19 fasta file and use it as reference for many more sample files rather than trying to reorder bam files for them all?

C:\Users\sg587\BartsData>java -jar GATK.jar -T HaplotypeCaller -R ref/hg19.fa -I 115N/115N-30129117/SG-115NS1.bam -o results/SG-115NS1HCcalls.vcf INFO 20:57:55,791 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:57:55,798 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29 INFO 20:57:55,799 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 20:57:55,800 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk INFO 20:57:55,800 HelpFormatter - [Tue Jul 05 20:57:55 BST 2016] Executing on Windows 7 6.1 x86 INFO 20:57:55,801 HelpFormatter - Java HotSpot(TM) Client VM 1.8.091-b14 JdkDeflater INFO 20:57:55,812 HelpFormatter - Program Args: -T HaplotypeCaller -R ref/hg19.fa -I 115N/115N-30129117/SG-115NS1.bam -o results/SG-115NS1HCcalls. vcf INFO 20:58:01,530 HelpFormatter - Executing as sg587@cppc118 on Windows 7 6.1 x86; Java HotSpot(TM) Client VM 1.8.091-b14. INFO 20:58:01,531 HelpFormatter - Date/Time: 2016/07/05 20:57:55 INFO 20:58:01,532 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:58:01,533 HelpFormatter - -------------------------------------------------------------------------------- INFO 20:58:01,573 GenomeAnalysisEngine - Strictness is SILENT INFO 20:58:02,172 GenomeAnalysisEngine - Downsampling Settings: Method: BYSAMPLE, Target Coverage: 500 INFO 20:58:02,194 SAMDataSource$SAMReaders - Initializing SAMRecords in serial WARNING: BAM index file C:\Users\sg587\BartsData\115N\115N-30129117\SG-115NS1.bam.bai is older than BAM C:\Users\sg587\BartsData\115N\115N-30129117\S G-115N_S1.bam INFO 20:58:02,269 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 INFO 20:58:02,346 HCMappingQualityFilter - Filtering out reads with MAPQ < 20

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.6-0-g89b7209):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://www.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63 formore information. Error details: The contig order in reads and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ERROR reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]

ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6sstohap7, chr6mcfhap5, chr6coxhap2, chr6mannhap4, chr6apdhap1, chr6qblhap6, chr6dbbhap3, chr17ctg5hap1, chr4ctg9hap1, chr1gl000192random, chrUngl000225, chr4gl000194random, chr4gl000193random, chr9gl000200random, chrUngl000222, chrUngl000212, chr7gl000195random, chrUngl000223, chrUngl000224, chrUngl000219, chr17gl000205random, chrUngl000215, chrUngl000216, chrUngl000217, chr9gl000199random, chrUngl000211, chrUngl000213, chrUngl000220, chrUngl000218, chr19gl000209random, chrUngl000221, chrUngl000214,chrUngl000228, chrUngl000227, chr1gl000191random, chr19gl000208random, chr9gl000198random, chr17gl000204random, chrUngl000233, chrUngl000237, chrUngl000230, chrUngl000242, chrUngl000243, chrUngl000241, chrUngl000236, chrUngl000240, chr17gl000206random, chrUngl000232, chrUn_gl000

234, chr11gl000202random, chrUngl000238, chrUngl000244, chrUngl000248, chr8gl000196random, chrUngl000249, chrUngl000246, chr17gl000203random, chr8gl000197random, chrUngl000245, chrUngl000247, chr9gl000201random, chrUngl000235, chrUngl000239, chr21gl000210random, chrUngl000231,chrUngl000229, chrM, chrUngl000226, chr18gl000207random]

ERROR ------------------------------------------------------------------------------------------

Many thanks Sumedha

From Will_Gilks on 2016-07-06

You should use the same reference genome for both read-mapping and SNP-calling.

Alternatively you can remove unmapped scaffolds from the current SNP-calling reference genome but I don’t recommend it.

From sumedhagarg on 2016-07-06

@Will_Gilks

Thanks! It is mapped onto hg19 and am using the same for snp calling too, but could it be that my data is exome sequencing on Illumina NexteraExome assay and analysis doesnt really need all these random contigs? I do think the issue is the order of these contigs in reference file not matching my sample files. Is there a way to reorder contigs in reference fasta file?

From Geraldine_VdAuwera on 2016-07-06

@sumedhagarg If the reference file does not contain the same contigs, it’s not the same reference, even if it has the same name. The extra contigs are included to improve the quality of the read mapping. It’s better to use a reference that includes them even if you’re only interested in exome regions. And if it’s not possible for you to remap your samples, then you should at least use the reference file that you used for mapping in your analysis, rather than mess around with the content of files. That is the safest way to proceed. If you don’t want to do it that way, we can’t help you if something goes wrong.

From daverdin on 2016-07-15

Hi everyone, I have some difficulties getting the vcf from HaplotypeCaller (it would take weeks to complete..) so I was wondering if it could be possible to use the hard-filters on a vcf I have obtained through platypus or other softwares (.bam files use for VCF come from GATK before HaplotypeCaller). When I try to use "SelectVariants" to apply a hard filter on the VCF obtained, I got an error message (end of this message). I applied several tips/potential corrections that I have found in the different GATK conversations like ordering my vcf with SortVcf, ordering the bam file with reorder and sortSam (coordinate) but none where conclusive, the names of my contigs in the VCF and in the reference are apparently still not matching. I'm confused about that because as I look at my reference and my vcf, the contigs seem to be in the same order.. (but the error message shows a different order for the sequence contigs..) The error message says that the "variant contigs" and the "sequence contigs" are not matching, not the "variant contigs" and the "reference contigs" .. Could the problem come from the .bam file order? Or is it just that vcf from platypus that can't be used with gatk?

I have tried several approaches to solve this problem but I'm stuck so I would greatly appreciate any help! Thanks Guilaume

ERROR MESSAGE: Input files variant and sequence have incompatible contigs. Please see http://gatkforums.broadinstitute.org/discussion/63/input-files-have-incompatible-contigsfor more information. Error details: The contig order in variant and sequenceis not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ERROR variant contigs = [scaffold1size161211, scaffold2size67455, scaffold3size80531, scaffold4size38253, scaffold5size36003, ==> all my contigs

ERROR sequence contigs = [scaffold100000size1030, scaffold100001size1030, scaffold100002size1030, scaffold100003size1030, ==> all my contigs

(the genome I'm working with has around 200'000 contigs, I can't attach any files as they are too big)

From Sheila on 2016-07-15

@daverdin

Hi Guilame,

I am a little confused. You were able to run the GATK pre-processing tools on your original BAM file, but the VCF from other tools is not compatible with GATK? Did you use the same reference you used in pre-processing when running the other tools?

-Sheila

From daverdin on 2016-07-15

Hi @Sheila,

Thank you for your response and sorry if this was confusing... I have followed the "Best Practices" workflow starting with pair end sequencing data (fastq) until the MergeBamAlignment part (I successfully obtained a .bam file). I then tried to use the HaplotypeCaller on this bam file but wasn't successful (I think the very high number of contigs in the genome I'm working with is the culprit here). So I decided to go back to the same bam file but this time use Platypus to get my VCF file. That worked well and wanted to see if I could go on in the "Best Practices" with this VCF by doing the hard filter (SelectVariants). That's where things get wrong with my contigs names not fitting..

I have use the same reference file and dictionary to obtain the bam file and the platypus VCF file.

I have tried different things like SortSam (coordinate) and SortVcf (with the same reference dictionary) but also reordersam but always got the same error message. If I don't do the sortVcf, the error is slightly different:

ERROR /home/jimp/Downloads/GATKexamples/selfing/CNJ1466/batch9orderedtasselcorrected.vcf contigs = [scaffold100042size1029, scaffold10007size7458, scaffold100165size1026, etc.....]

ERROR reference contigs = [scaffold1size161211, scaffold2size67455, scaffold3size80531, scaffold4size38253, etc.....]

the command line I've used: java -Xmx7G -jar /home/jimp/Desktop/GATK/GenomeAnalysisTK.jar -T SelectVariants -R /home/jimp/Downloads/GATKexamples/selfing/new/crGenomescaffolds.fasta -V VariantCallssamsortedvcfsorted.vcf -selectType SNP -o rawsnps.vcf &> error

Any idea on what is going on here? thanks a lot!

From Geraldine_VdAuwera on 2016-07-15

@daverdin No, if you use a completely different caller, you can’t get back to best practices. The Best Practices are formulated based on how tools work very specifically, and each tool in the toolchain has specific expectations about upstream processes.

From vectorio on 2017-01-03

Hello,

After sorting my dbsnp vcf file as you suggested to I am still facing troubles with HaplotypeCaller as it throws the error message:

ERROR --

ERROR stack trace

java.lang.IllegalStateException: Rod span chr1:1-249250621 isn't contained within the data shard chr1:1-249250621, meaning we wouldn't get all of the data we need at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions$ActiveRegionIterator.(TraverseActiveRegions.java:307) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:271) at org.broadinstitute.gatk.engine.traversals.TraverseActiveRegions.traverse(TraverseActiveRegions.java:78) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------

ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):

ERROR

ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

ERROR If not, please post the error message, with stack trace, to the GATK forum.

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR MESSAGE: Rod span chr1:1-249250621 isn't contained within the data shard chr1:1-249250621, meaning we wouldn't get all of the data we need

ERROR -----------------------------------------------------------------------------------------

Here is the command line I run:

java -jar GenomeAnalysisTK/GenomeAnalysisTK.jar -R Homosapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -T HaplotypeCaller -I drugtolerantsinglecell1MDAMBA231RNAseq.bam --dbsnp All20151104.new.vcf -standcallconf 15 -o output.raw.snps.indels.vcf -U ALLOWSEQDICTINCOMPATIBILITY

Please any help would be greatly welcomed.

Davd S.

From Sheila on 2017-01-03

@vectorio

Hi Davd,

What happens if you run without `-U ALLOW_SEQ_DICT_INCOMPATIBILITY`?

Thanks,

Sheila

From vectorio on 2017-01-03

Without -U ALLOW_SEQ_DICT_INCOMPATIBILITY I get:

`ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.`

I run GATK 3.7-0-gcfedb67.

David S.

From vectorio on 2017-01-03

I finally found what was wrong, I just didn’t run ReorderSam on my bam file. After I did that, GATK ran like a charm !

Thanks for the reply.

David S.

From Sheila on 2017-01-04

@vectorio

Hi David,

Thanks for sharing your solution!

-Sheila

From mjtiv on 2017-01-18

Question about -U ALLOW_SEQ_DICT_INCOMPATIBILITY command. I saw that you mentioned this command will fix chromosome order issues. I am running the RNA-Seq pipeline (https://software.broadinstitute.org/gatk/guide/article?id=3891) using the latest build for the chicken genome and vcf file from Ensemble. When I got to step 5. Base Recalibration I get thrown the error message of “The contig order in knownSites and reference is not the same”. Please note I aligned my RNA-seq data using the same reference, sorted the VCF file, indexed the genome build, SO is it appropriate to use the above command because all the data should be aligned using the same source (Ensembles builds for genome and vcf), or am I doing something horribly wrong when performing GATK’s pipeline? It also confused me why the BAM file isn’t sorting correctly too.

Any feedback on this issue would be greatly appreciated.

From Geraldine_VdAuwera on 2017-01-19

We only use that -U argument to test whether order is actually the problem; we really don’t recommend using it to force analysis when getting a sorting error.

The most common stumbling block is that when you have sorted the vcf using Picard SortVcf, you need to delete the vcf index and regenerate it. This is because although the tool generates a new index, that index is actually not updated with the new sort order. There’s a ticket to get this fixed in Picard but it’s been hard to motivate anyone to fix it, as it’s so easy to work around. But it does cause so much confusion, it is worth fixing.

From mjtiv on 2017-01-19

Thank you for that suggestion, solved the entire problem!

From escaon on 2017-01-23

Hi there,

I encounter some problems with contigs not sorted the same way bewteen reference and cosmic.vcf

My reference = Homosapiensassembly38 (from GATK_bundle hg38) My COSMIC file = CosmicCodingMuts.vcf (built for grch38)

Mutect2 command : java -jar $GATK -T MuTect2 -R GATKbundleh38/Homosapiensassembly38.fasta -I:tumor BISCEm/VarCallingResults/R1411tcsreordered.4.bam --dbsnp GATKbundleh38/dbsnp146.hg38.vcf --cosmic Mutect2/CosmicCodingMutschrMsorted.vcf --artifactdetection_mode -o test.vcf

Yielding the following error

ERROR MESSAGE: Input files cosmic and reference have incompatible contigs. Please see https://software.broadinstitute.org/gatk/documentation/article?id=63for more information. Error details: The contig order in cosmic and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files..

ERROR cosmic contigs = [HLA-A01:01:01:01, HLA-A01:01:01:02N, ....

ERROR reference contigs = [chr1, chr2, ....

I did check the intersection between my reference & cosmic contigs names with a python script. Both contains 3366 unique contigs name, and both contains the same 3366 contigs names. They are just not ordered the same way.

The thing is, i tried "my best" to avoid this, given that in my Mutect2 command, the input files were both sorted against the same dictionary (Homosapiensassembly38.dict) with the following commands

For the COSMIC.vcf java -jar $PICARD SortVcf I=Mutect2/CosmicCodingMutschrM.vcf O=Mutect2/CosmicCodingMutschrMsorted.vcf SEQUENCEDICTIONARY=GATKbundleh38/Homosapiensassembly38.dict

For the .bam java -jar $PICARD ReorderSam I=BISCEm/VarCallingResults/R1411tcs.4.bam O=BISCEm/VarCallingResults/R1411tcsreordered.4.bam R=GATKbundleh38/Homosapiensassembly38.fasta CREATEINDEX=TRUE

What's the next step to resolve this issue ?

Best regards

From Geraldine_VdAuwera on 2017-01-24

@escaon Once you have sorted the files, be sure to regenerate all index files manually. There is a bug that causes the index to not be updated automatically.

From escaon on 2017-01-24

Right, removing the .idx associated with my sorted .vcf file did the trick.

(When i launched Mutect2 w-o any index for my cosmic vcf file, one was generated, and it was a good one ;-))

From escaon on 2017-01-24

https://www.biostars.org/p/232994/#233135

From mcengeholm on 2017-04-19

Hi there,

I'm trying to follow your best practice guidelines for calling variants in RNAseq. At the "Split'N'Trim and reassign mapping qualities" step I get the following error message:

$ java -jar $GATK -T SplitNCigarReads -R $PTG/$GN -I test.dedupped.bam -o test.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS INFO 12:32:22,594 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:32:22,595 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 INFO 12:32:22,596 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 12:32:22,596 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 12:32:22,596 HelpFormatter - [Wed Apr 19 12:32:22 CEST 2017] Executing on Linux 3.10.0-327.el7.x86_64 amd64 INFO 12:32:22,596 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_121-b13 INFO 12:32:22,599 HelpFormatter - Program Args: -T SplitNCigarReads -R /home/mce/RME/resources/genomes/human/Homo_sapiens.GRCh37.dna.primary_assembly.ucsc_naming.fa -I test.dedupped.bam -o test.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS INFO 12:32:27,612 HelpFormatter - Executing as mce@dzne-go-cn03 on Linux 3.10.0-327.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b13. INFO 12:32:27,612 HelpFormatter - Date/Time: 2017/04/19 12:32:22 INFO 12:32:27,612 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:32:27,613 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:32:27,680 GenomeAnalysisEngine - Strictness is SILENT INFO 12:32:27,859 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 12:32:27,865 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:32:27,889 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67): ##### ERROR ##### ERROR This means that one or more arguments or inputs in your command are incorrect. ##### ERROR The error message below tells you what is the problem. ##### ERROR ##### ERROR If the problem is an invalid argument, please check the online documentation guide ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool. ##### ERROR ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk ##### ERROR ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself. ##### ERROR ##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1328for more information. Error details: reads contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrMT, chrX, chrY, GL000192.1, GL000225.1, GL000194.1, GL000193.1, GL000200.1, GL000222.1, GL000212.1, GL000195.1, GL000223.1, GL000224.1, GL000219.1, GL000205.1, GL000215.1, GL000216.1, GL000217.1, GL000199.1, GL000211.1, GL000213.1, GL000220.1, GL000218.1, GL000209.1, GL000221.1, GL000214.1, GL000228.1, GL000227.1, GL000191.1, GL000208.1, GL000198.1, GL000204.1, GL000233.1, GL000237.1, GL000230.1, GL000242.1, GL000243.1, GL000241.1, GL000236.1, GL000240.1, GL000206.1, GL000232.1, GL000234.1, GL000202.1, GL000238.1, GL000244.1, GL000248.1, GL000196.1, GL000249.1, GL000246.1, GL000203.1, GL000197.1, GL000245.1, GL000247.1, GL000201.1, GL000235.1, GL000239.1, GL000210.1, GL000231.1, GL000229.1, GL000226.1, GL000207.1] ##### ERROR ------------------------------------------------------------------------------------------

I have reordered the bam as suggested in the post and also recreated the index "manually" but keep getting the same error:

$ java -jar $PIC ReorderSam I=test.dedupped.bam O=test.reordered.bam R=$PTG/$GN CREATE_INDEX=TRUE $ samtools index test.reordered.bam $ java -jar $GATK -T SplitNCigarReads -R $PTG/$GN -I test.reordered.bam -o test.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS INFO 12:43:17,850 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:43:17,853 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18 INFO 12:43:17,853 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 12:43:17,853 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 12:43:17,853 HelpFormatter - [Wed Apr 19 12:43:17 CEST 2017] Executing on Linux 3.10.0-327.el7.x86_64 amd64 INFO 12:43:17,853 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_121-b13 INFO 12:43:17,856 HelpFormatter - Program Args: -T SplitNCigarReads -R /home/mce/RME/resources/genomes/human/Homo_sapiens.GRCh37.dna.primary_assembly.ucsc_naming.fa -I test.reordered.bam -o test.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS INFO 12:43:17,861 HelpFormatter - Executing as mce@dzne-go-cn03 on Linux 3.10.0-327.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b13. INFO 12:43:17,862 HelpFormatter - Date/Time: 2017/04/19 12:43:17 INFO 12:43:17,862 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:43:17,862 HelpFormatter - -------------------------------------------------------------------------------- INFO 12:43:17,925 GenomeAnalysisEngine - Strictness is SILENT INFO 12:43:18,085 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 12:43:18,091 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 12:43:18,116 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67): ##### ERROR ##### ERROR This means that one or more arguments or inputs in your command are incorrect. ##### ERROR The error message below tells you what is the problem. ##### ERROR ##### ERROR If the problem is an invalid argument, please check the online documentation guide ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool. ##### ERROR ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk ##### ERROR ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself. ##### ERROR ##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1328for more information. Error details: reads contigs = [chr1, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr2, chr20, chr21, chr22, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrMT, chrX, chrY, GL000192.1, GL000225.1, GL000194.1, GL000193.1, GL000200.1, GL000222.1, GL000212.1, GL000195.1, GL000223.1, GL000224.1, GL000219.1, GL000205.1, GL000215.1, GL000216.1, GL000217.1, GL000199.1, GL000211.1, GL000213.1, GL000220.1, GL000218.1, GL000209.1, GL000221.1, GL000214.1, GL000228.1, GL000227.1, GL000191.1, GL000208.1, GL000198.1, GL000204.1, GL000233.1, GL000237.1, GL000230.1, GL000242.1, GL000243.1, GL000241.1, GL000236.1, GL000240.1, GL000206.1, GL000232.1, GL000234.1, GL000202.1, GL000238.1, GL000244.1, GL000248.1, GL000196.1, GL000249.1, GL000246.1, GL000203.1, GL000197.1, GL000245.1, GL000247.1, GL000201.1, GL000235.1, GL000239.1, GL000210.1, GL000231.1, GL000229.1, GL000226.1, GL000207.1] ##### ERROR ------------------------------------------------------------------------------------------

I have also tried to add -U ALLOW_SEQ_DICT_INCOMPATIBILITY but then I get

$ java -jar $GATK -T SplitNCigarReads -R $PTG/$GN -I test.reordered.bam -o test.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS -U ALLOW_SEQ_DICT_INCOMPATIBILITY ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A USER ERROR has occurred (version 3.7-0-gcfedb67): ##### ERROR ##### ERROR This means that one or more arguments or inputs in your command are incorrect. ##### ERROR The error message below tells you what is the problem. ##### ERROR ##### ERROR If the problem is an invalid argument, please check the online documentation guide ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool. ##### ERROR ##### ERROR Visit our website and forum for extensive documentation and answers to ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk ##### ERROR ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself. ##### ERROR ##### ERROR MESSAGE: Argument 'U' has too many values: [org.broadinstitute.gatk.utils.commandline.ArgumentMatchStringValue@4c79ca55, org.broadinstitute.gatk.utils.commandline.ArgumentMatchStringValue@3a63d248]. ##### ERROR ------------------------------------------------------------------------------------------

Do you have any idea what the problem could be here and how to proceed?

Thanks a lot for your help in advance,

Maik

From Sheila on 2017-04-19

@mcengeholm

Hi Maik,

Can you please post the BAM header (specifically the @SQ lines) and FASTA .dict file here?

Thanks,

Sheila

From mcengeholm on 2017-04-20

@Sheila

Hi Sheila,

thanks a lot for the quick reply. I have attached the header of the orignial bam, the reordered bam and the reference dict file below.

Thanks a lot for your help in advance,

Maik

test.dedupped.header.txt

test.reordered.header.txt

reference.dict.txt

From Sheila on 2017-05-07

@mcengeholm

Hi Maik

Sorry for the delay. Can you tell us what you mean by “I have reordered the bam as suggested in the post and also recreated the index “manually” but keep getting the same error:”. What do you mean by manually?

-Sheila

Report abuse