created by Geraldine_VdAuwera
on 2012-08-09
Objective
Run a basic analysis command on example data, parallelized with Queue.
Prerequisites
Steps
One very cool feature of Queue is that you can test your script by doing a "dry run". That means Queue will prepare the analysis and build the scatter commands, but not actually run them. This makes it easier to check the sanity of your script and command.
Here we're going to set up a dry run of a CountReads analysis. You should be familiar with the CountReads walker and the example files from the bundles, as used in the basic "GATK for the first time" tutorial. In addition, we're going to use the example QScript called ExampleCountReads.scala
provided in the Queue package download.
Action
Type the following command:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
where -S ExampleCountReads.scala
specifies which QScript we want to run, -R exampleFASTA.fasta
specifies the reference sequence, and -I exampleBAM.bam
specifies the file of aligned reads we want to analyze.
Expected Result
After a few seconds you should see output that looks nearly identical to this:
INFO 00:30:45,527 QScriptManager - Compiling 1 QScript INFO 00:30:52,869 QScriptManager - Compilation complete INFO 00:30:53,284 HelpFormatter - ---------------------------------------------------------------------- INFO 00:30:53,284 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21 INFO 00:30:53,284 HelpFormatter - Copyright (c) 2012 The Broad Institute INFO 00:30:53,284 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk INFO 00:30:53,285 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam INFO 00:30:53,285 HelpFormatter - Date/Time: 2012/08/09 00:30:53 INFO 00:30:53,285 HelpFormatter - ---------------------------------------------------------------------- INFO 00:30:53,285 HelpFormatter - ---------------------------------------------------------------------- INFO 00:30:53,290 QCommandLine - Scripting ExampleCountReads INFO 00:30:53,364 QCommandLine - Added 1 functions INFO 00:30:53,364 QGraph - Generating graph. INFO 00:30:53,388 QGraph - ------- INFO 00:30:53,402 QGraph - Pending: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta' INFO 00:30:53,403 QGraph - Log: /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads-1.out INFO 00:30:53,403 QGraph - Dry run completed successfully! INFO 00:30:53,404 QGraph - Re-run with "-run" to execute the functions. INFO 00:30:53,409 QCommandLine - Script completed successfully with 1 total jobs INFO 00:30:53,410 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt
If you don't see this, check your spelling (GATK commands are case-sensitive), check that the files are in your working directory, and if necessary, re-check that the GATK and Queue are properly installed.
If you do see this output, congratulations! You just successfully ran you first Queue dry run!
Once you have verified that the Queue functions have been generated successfully, you can execute the pipeline by appending -run
to the command line.
Action
Instead of this command, which we used earlier:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
this time you type this:
java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run
See the difference?
Result
You should see output that looks nearly identical to this:
INFO 00:56:33,688 QScriptManager - Compiling 1 QScript INFO 00:56:39,327 QScriptManager - Compilation complete INFO 00:56:39,487 HelpFormatter - ---------------------------------------------------------------------- INFO 00:56:39,487 HelpFormatter - Queue v2.0-36-gf5c1c1a, Compiled 2012/08/08 20:18:21 INFO 00:56:39,488 HelpFormatter - Copyright (c) 2012 The Broad Institute INFO 00:56:39,488 HelpFormatter - Fro support and documentation go to http://www.broadinstitute.org/gatk INFO 00:56:39,489 HelpFormatter - Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run INFO 00:56:39,490 HelpFormatter - Date/Time: 2012/08/09 00:56:39 INFO 00:56:39,490 HelpFormatter - ---------------------------------------------------------------------- INFO 00:56:39,491 HelpFormatter - ---------------------------------------------------------------------- INFO 00:56:39,498 QCommandLine - Scripting ExampleCountReads INFO 00:56:39,569 QCommandLine - Added 1 functions INFO 00:56:39,569 QGraph - Generating graph. INFO 00:56:39,589 QGraph - Running jobs. INFO 00:56:39,623 FunctionEdge - Starting: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta' INFO 00:56:39,623 FunctionEdge - Output written to /Users/GG/codespace/GATK/Q2/resources/ExampleCountReads-1.out INFO 00:56:50,301 QGraph - 0 Pend, 1 Run, 0 Fail, 0 Done INFO 00:57:09,827 FunctionEdge - Done: 'java' '-Xmx1024m' '-Djava.io.tmpdir=/Users/vdauwera/sandbox/Q2/resources/tmp' '-cp' '/Users/vdauwera/sandbox/Q2/resources/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'CountReads' '-I' '/Users/vdauwera/sandbox/Q2/resources/exampleBAM.bam' '-R' '/Users/vdauwera/sandbox/Q2/resources/exampleFASTA.fasta' INFO 00:57:09,828 QGraph - 0 Pend, 0 Run, 0 Fail, 1 Done INFO 00:57:09,835 QCommandLine - Script completed successfully with 1 total jobs INFO 00:57:09,835 QCommandLine - Writing JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.txt INFO 00:57:10,107 QCommandLine - Plotting JobLogging GATKReport to file /Users/vdauwera/sandbox/Q2/resources/ExampleCountReads.jobreport.pdf WARN 00:57:18,597 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info.
Great! It works!
The results of the traversal will be written to a file in the current directory. The name of the file will be printed in the output, ExampleCountReads.out in this example.
If for some reason the run was interrupted, in most cases you can resume by just launching the command. Queue will pick up where it left off without redoing the parts that ran successfully.
Run with -bsub
to run on LSF, or for early Grid Engine support see Queue with Grid Engine.
See also QFunction and Command Line Options for more info on Queue options.
Updated on 2013-07-08
From oriol_senan on 2012-10-09
The link “how to use GATK for the first time” is not working
From Geraldine_VdAuwera on 2012-10-10
Prerequisites links are fixed, thanks for reporting this.
From lucdh on 2012-10-12
I looks like some more links need fixing:
- on this page the 3 links under paragraph 3. Running on a computing farm
- on http://gatkforums.broadinstitute.org/discussion/1285/parallelism-with-the-gatk: the GATK-Queue link
From Geraldine_VdAuwera on 2012-10-12
Thanks for reporting, we’ll fix these asap.
From omedvedeva on 2012-11-13
I can't perform a first dry run on Windows 7 with Queue 2.2.5. The installation seems to be correct since --help option works. It looks like it can't find the tmp directory that it creates at the correct location. The same problem occurs with QueueLite too. What am I missing? In the stack trace below fasta, bam and scala files were in the working directory:
C:\GATK\Queue-2.2-5-g3bf5e3f>java -Djava.io.tmpdir=tmp -jar Queue.jar -S Example CountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam ERROR 10:17:34,493 QScriptManager - \GATK\Queue-2.2-5-g3bf5e3f\tmp\Q-Classes-80 75780960630530304 does not exist or is not a directory INFO 10:17:35,965 QScriptManager - Compiling 1 QScript INFO 10:17:40,538 QScriptManager - Compilation complete
ERROR stack trace
org.broadinstitute.sting.commandline.InvalidArgumentException: Argument with name 'R' isn't defined. at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:303) at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:276) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:204) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:146) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala: 62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
ERROR A GATK RUNTIME ERROR has occurred (version 2.2-5-g3bf5e3f):
...
ERROR MESSAGE: Argument with name 'R' isn't defined.
ERROR --------------------------------------------------------------------
Thank you, Olga.
From Geraldine_VdAuwera on 2012-11-13
I’m sorry Olga, we can’t provide support for running GATK or Queue on Windows. There are differences in I/O management that cause problems with filepaths, and we can’t shoulder the support burden of helping you figure that out. You should post this question in the Ask the Community section; perhaps others will be able to advise you on this point.
From elisa1507 on 2012-12-03
Hi,
So I’m trying to run this tutorial and the first script runs fine and looks exactly like step 1.
Once I put -run at the end of it, I’m getting an error that looks like this :
ERROR 15:07:45,844 FunctionEdge – Error: ‘java’ ‘-Xmx1024m’ ‘-XX:+UseParallelOldGC’ ‘-XX:ParallelGCThreads=4’ ‘-XX:GCTimeLimit=50’ ‘-XX:GCHeapFreeLimit=10’ ‘-Djava.io.tmpdir=/Users/jones/bin/Queue/tmp’ ‘-cp’ ‘/Users/jones/bin/Queue/Queue.jar’ ‘org.broadinstitute.sting.gatk.CommandLineGATK’ ‘-T’ ‘CountReads’ ‘-I’ ‘/Users/jones/bin/Queue/resources/exampleBAM.bam’ ‘-R’ ‘/Users/jones/bin/Queue/resources/exampleFASTA.fasta’ ERROR 15:07:45,851 FunctionEdge – Contents of /Users/jones/bin/Queue/ExampleCountReads-1.out: Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed Could not create the Java virtual machine. INFO 15:07:45,852 QGraph – Writing incremental jobs reports… INFO 15:07:45,853 QJobsReporter – Writing JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.txt INFO 15:07:45,884 QGraph – 0 Pend, 0 Run, 1 Fail, 0 Done INFO 15:07:45,886 QCommandLine – Script failed with 1 total jobs INFO 15:07:45,889 QCommandLine – Writing final jobs report… INFO 15:07:45,889 QJobsReporter – Writing JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.txt INFO 15:07:45,893 QJobsReporter – Plotting JobLogging GATKReport to file /Users/jones/bin/Queue/ExampleCountReads.jobreport.pdf WARN 15:07:46,693 RScriptExecutor – RScript exited with 1. Run with -l DEBUG for more info. INFO 15:07:46,695 QCommandLine – Done with errors INFO 15:07:46,697 QGraph – ———- INFO 15:07:46,699 QGraph – Failed: ‘java’ ‘-Xmx1024m’ ‘-XX:+UseParallelOldGC’ ‘-XX:ParallelGCThreads=4’ ‘-XX:GCTimeLimit=50’ ‘-XX:GCHeapFreeLimit=10’ ‘-Djava.io.tmpdir=/Users/jones/bin/Queue/tmp’ ‘-cp’ ‘/Users/jones/bin/Queue/Queue.jar’ ‘org.broadinstitute.sting.gatk.CommandLineGATK’ ‘-T’ ‘CountReads’ ‘-I’ ‘/Users/jones/bin/Queue/resources/exampleBAM.bam’ ‘-R’ ‘/Users/jones/bin/Queue/resources/exampleFASTA.fasta’ INFO 15:07:46,700 QGraph – Log: /Users/jones/bin/Queue/ExampleCountReads-1.out
Do you know why this could be please? I’m new to this!
Thanks!
From grumblr on 2013-02-06
The QFunction and Command Line Options links point to this same page….
See also QFunction and Command Line Options for more info on Queue options.
From Geraldine_VdAuwera on 2013-02-06
Hi @grumblr, sorry about the dead links, I’ll fix them asap. The articles they refer to should be in the Developer Zone.
From Geraldine_VdAuwera on 2013-02-06
Hi @elisa1507, I just realized I never answered your question. Sorry about that, it must have slipped through my net. Did you find the solution to your problem or do you still need help with that?
From chukhman on 2013-02-06
Hi all, I ran the above tutorial and received the specified output but I’m not sure how to interpret it. The ExampleCountReads-1.out file seems error free but the ExampleCountReads.jobreport.txt file only contains the line “#:GATKReport.v1.1:0” and nothing else. Also, the ExampleCountReads.jobreport.pdf file is unreadable. The warning “RScriptExecutor – RScript exited with 1” bothers me and upon rerunning with -l DEBUG, it shows several issues with R packages having functions masked (not sure what that means) and the exit status 1 seems to be caused by some “argument 1 is not a vector”. Is this all the correct behavior or are these issues really problems that I need to worry about? Thanks for your help!
Morris Chukhman, MS
UIC Bioinformatics
From Geraldine_VdAuwera on 2013-02-08
Hi Morris,
It sounds like your analysis run went fine but it’s the peripheral reporting that screwed up. Can you post the contents of the ExampleCountReads-1.out file to be sure? Also, do you know if you have gsalib installed?
From chukhman on 2013-02-11
Thanks Geraldine for you reply!
Here is the contents of ExampleCountReads-1.out:
`
INFO 15:24:31,080 GenomeAnalysisEngine – Strictness is SILENT
INFO 15:24:31,083 ReferenceDataSource – Dict file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.dict does not exist. Trying to create it now.
[Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta OUTPUT=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/dict3620772975149938405.tmp TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Feb 05 15:24:31 CST 2013] Executing as pkanabar@nike.structure.uic.edu on Linux 2.6.32-279.1.1.el6.x86_64 amd64; Java HotSpot™ 64-Bit Server VM 1.6.0_17-b04; Picard version: null
[Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=244187136
INFO 15:24:31,406 GenomeAnalysisEngine – Downsampling Settings: No downsampling
INFO 15:24:31,415 SAMDataSource$SAMReaders – Initializing SAMRecords in serial
INFO 15:24:31,428 SAMDataSource$SAMReaders – Done initializing BAM readers: total time 0.01
INFO 15:24:31,461 ProgressMeter – [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:24:31,461 ProgressMeter – Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 15:24:31,517 ReadShardBalancer$1 – Loading BAM index data for next contig
INFO 15:24:31,521 ReadShardBalancer$1 – Done loading BAM index data for next contig
INFO 15:24:31,540 ReadShardBalancer$1 – Loading BAM index data for next contig
INFO 15:24:31,549 Walker – [REDUCE RESULT] Traversal result is: 33
INFO 15:24:31,551 ProgressMeter – done 3.30e+01 0.1 s 44.9 m 97.3% 0.1 s 0.0 s
INFO 15:24:31,552 ProgressMeter – Total runtime 0.09 secs, 0.00 min, 0.00 hours
INFO 15:24:31,669 MicroScheduler – 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO 15:24:32,547 GATKRunReport – Uploaded run statistics report to AWS S3
~
`
It seems to be working properly since that is exactly what the sample output in the GATK tutorial looks like.
Here is the output when I run the whole Queue.jar job and the command that I used:
`java -Djava.io.tmpdir=tmp -jar /data1/rhel60/gatk_git20130205/dist/Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG`
INFO 11:06:34,442 QScriptManager – Compiling 1 QScript DEBUG 11:06:34,446 QScriptManager – Compilation directory: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Q-Classes-5894914949320226077 INFO 11:06:38,335 QScriptManager – Compilation complete INFO 11:06:38,578 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,578 HelpFormatter – Queue vexported, Compiled 2013/02/06 15:30:41 INFO 11:06:38,578 HelpFormatter – Copyright © 2012 The Broad Institute INFO 11:06:38,578 HelpFormatter – For support and documentation go to http://www.broadinstitute.org/gatk DEBUG 11:06:38,578 HelpFormatter – Current directory: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk INFO 11:06:38,579 HelpFormatter – Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG INFO 11:06:38,579 HelpFormatter – Date/Time: 2013/02/11 11:06:38 INFO 11:06:38,579 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,579 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,587 QCommandLine – Scripting ExampleCountReads DEBUG 11:06:38,635 QGraph – adding QNode: 0 INFO 11:06:38,644 QCommandLine – Added 1 functions INFO 11:06:38,645 QGraph – Generating graph. INFO 11:06:38,659 QGraph – Running jobs. INFO 11:06:38,663 QGraph – ———- INFO 11:06:38,676 QGraph – Done: ‘java’ ‘-Xmx1024m’ ‘-XX:+UseParallelOldGC’ ‘-XX:ParallelGCThreads=4’ ‘-XX:GCTimeLimit=50’ ‘-XX:GCHeapFreeLimit=10’ ‘-Djava.io.tmpdir=/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp’ ‘-cp’ ‘/data1/rhel60/gatk_git20130205/dist/Queue.jar’ ‘org.broadinstitute.sting.gatk.CommandLineGATK’ ‘-T’ ‘CountReads’ ‘-I’ ‘/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam’ ‘-R’ ‘/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta‘ DEBUG 11:06:38,676 QGraph – Inputs: List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bai, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleBAM.bam.bai, /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/exampleFASTA.fasta) DEBUG 11:06:38,676 QGraph – Outputs: List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads-1.out) DEBUG 11:06:38,677 QGraph – Done+: List(/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/.ExampleCountReads-1.out.done) DEBUG 11:06:38,677 QGraph – Done-: List() DEBUG 11:06:38,677 QGraph – CmdDir: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk DEBUG 11:06:38,677 QGraph – Temp?: false DEBUG 11:06:38,678 QGraph – Prev: none (reset = false) INFO 11:06:38,678 QGraph – Log: /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads-1.out INFO 11:06:38,685 QGraph – 0 Pend, 0 Run, 0 Fail, 1 Done INFO 11:06:38,687 QCommandLine – Script failed with 1 total jobs INFO 11:06:38,687 QCommandLine – Writing final jobs report… INFO 11:06:38,687 QJobsReporter – Writing JobLogging GATKReport to file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt INFO 11:06:38,698 QJobsReporter – Plotting JobLogging GATKReport to file /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.pdf DEBUG 11:06:38,709 RScriptExecutor – Executing: DEBUG 11:06:38,709 RScriptExecutor – Rscript DEBUG 11:06:38,709 RScriptExecutor – -e DEBUG 11:06:38,709 RScriptExecutor – tempLibDir = ‘/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Rlib.4673101731368374405’;install.packages(pkgs=c(‘/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/gsalib.tar.6822779882938808174.gz’), lib=tempLibDir, repos=NULL, type=‘source’, INSTALL_opts=c(‘—no-libs’, ‘—no-data’, ‘—no-help’, ‘—no-demo’, ‘—no-exec’));library(‘gsalib’, lib.loc=tempLibDir);source(‘/mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/queueJobReport.6906968465526462577.R’); DEBUG 11:06:38,710 RScriptExecutor – /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt DEBUG 11:06:38,710 RScriptExecutor – /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.pdf * installing source package âgsalibâ … ** Creating default NAMESPACE file ** R ** preparing package for lazy loading ** building package indices ** testing if installed package can be loaded * DONE (gsalib) Loading required package: methods Loading required package: gtools Loading required package: gdata gdata: read.xls support for ‘XLS’ (Excel 97-2004) files ENABLED. gdata: read.xls support for ‘XLSX’ (Excel 2007+) files ENABLED. Attaching package: âgdataâ The following object(s) are masked from âpackage:statsâ: nobs The following object(s) are masked from âpackage:utilsâ: object.size Loading required package: caTools Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: âgplotsâ The following object(s) are masked from âpackage:statsâ: lowess Loading required package: plyr Attaching package: âreshapeâ The following object(s) are masked from âpackage:plyrâ: rename, round_any [1] “Report“ [1] “Project : /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/ExampleCountReads.jobreport.txt“ Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) : argument 1 is not a vector Calls: source … withVisible -> eval -> eval -> plotJobsGantt -> order Execution halted DEBUG 11:06:42,668 RScriptExecutor – Result: 1 WARN 11:06:42,668 RScriptExecutor – RScript exited with 1 DEBUG 11:06:42,674 IOUtils – Deleted /mnt/pinal/pinal/sgreen/genotype_11_samples/dry_run_gatk/tmp/Q-Classes-5894914949320226077
It doesn’t seem to be complaining about ‘gsalib’ in particular but the objects masked from the packages seem a bit odd. The failure seems to be in plotJobsGantt but I’m not sure if its the app itself or something upstream that is causing the failure.
Thanks so much for helping us debug this!
Cheers!
Morris
From chukhman on 2013-02-11
Thanks Geraldine for you reply!
Here is the contents of ExampleCountReads-1.out:
`
INFO 15:24:31,080 GenomeAnalysisEngine – Strictness is SILENT
INFO 15:24:31,083 ReferenceDataSource – Dict file /dry_run_gatk/exampleFASTA.dict does not exist. Trying to create it now.
[Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/dry_run_gatk/exampleFASTA.fasta OUTPUT=/dry_run_gatk/dict3620772975149938405.tmp TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Feb 05 15:24:31 CST 2013] Executing as pkanabar@nike.structure.uic.edu on Linux 2.6.32-279.1.1.el6.x86_64 amd64; Java HotSpot™ 64-Bit Server VM 1.6.0_17-b04; Picard version: null
[Tue Feb 05 15:24:31 CST 2013] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=244187136
INFO 15:24:31,406 GenomeAnalysisEngine – Downsampling Settings: No downsampling
INFO 15:24:31,415 SAMDataSource$SAMReaders – Initializing SAMRecords in serial
INFO 15:24:31,428 SAMDataSource$SAMReaders – Done initializing BAM readers: total time 0.01
INFO 15:24:31,461 ProgressMeter – [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:24:31,461 ProgressMeter – Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 15:24:31,517 ReadShardBalancer$1 – Loading BAM index data for next contig
INFO 15:24:31,521 ReadShardBalancer$1 – Done loading BAM index data for next contig
INFO 15:24:31,540 ReadShardBalancer$1 – Loading BAM index data for next contig
INFO 15:24:31,549 Walker – [REDUCE RESULT] Traversal result is: 33
INFO 15:24:31,551 ProgressMeter – done 3.30e+01 0.1 s 44.9 m 97.3% 0.1 s 0.0 s
INFO 15:24:31,552 ProgressMeter – Total runtime 0.09 secs, 0.00 min, 0.00 hours
INFO 15:24:31,669 MicroScheduler – 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO 15:24:32,547 GATKRunReport – Uploaded run statistics report to AWS S3
~
`
It seems to be working properly since that is exactly what the sample output in the GATK tutorial looks like.
Here is the output when I run the whole Queue.jar job and the command that I used:
`java -Djava.io.tmpdir=tmp -jar /data1/rhel60/gatk_git20130205/dist/Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG`
INFO 11:06:34,442 QScriptManager – Compiling 1 QScript DEBUG 11:06:34,446 QScriptManager – Compilation directory: /dry_run_gatk/tmp/Q-Classes-5894914949320226077 INFO 11:06:38,335 QScriptManager – Compilation complete INFO 11:06:38,578 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,578 HelpFormatter – Queue vexported, Compiled 2013/02/06 15:30:41 INFO 11:06:38,578 HelpFormatter – Copyright © 2012 The Broad Institute INFO 11:06:38,578 HelpFormatter – For support and documentation go to http://www.broadinstitute.org/gatk DEBUG 11:06:38,578 HelpFormatter – Current directory: /dry_run_gatk INFO 11:06:38,579 HelpFormatter – Program Args: -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run -l DEBUG INFO 11:06:38,579 HelpFormatter – Date/Time: 2013/02/11 11:06:38 INFO 11:06:38,579 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,579 HelpFormatter – ——————————————————————————————————— INFO 11:06:38,587 QCommandLine – Scripting ExampleCountReads DEBUG 11:06:38,635 QGraph – adding QNode: 0 INFO 11:06:38,644 QCommandLine – Added 1 functions INFO 11:06:38,645 QGraph – Generating graph. INFO 11:06:38,659 QGraph – Running jobs. INFO 11:06:38,663 QGraph – ———- INFO 11:06:38,676 QGraph – Done: ‘java’ ‘-Xmx1024m’ ‘-XX:+UseParallelOldGC’ ‘-XX:ParallelGCThreads=4’ ‘-XX:GCTimeLimit=50’ ‘-XX:GCHeapFreeLimit=10’ ‘-Djava.io.tmpdir=/dry_run_gatk/tmp’ ‘-cp’ ‘/data1/rhel60/gatk_git20130205/dist/Queue.jar’ ‘org.broadinstitute.sting.gatk.CommandLineGATK’ ‘-T’ ‘CountReads’ ‘-I’ ‘/dry_run_gatk/exampleBAM.bam’ ‘-R’ ‘/dry_run_gatk/exampleFASTA.fasta‘ DEBUG 11:06:38,676 QGraph – Inputs: List(/dry_run_gatk/exampleBAM.bai, /dry_run_gatk/exampleBAM.bam, /dry_run_gatk/exampleBAM.bam.bai, /dry_run_gatk/exampleFASTA.fasta) DEBUG 11:06:38,676 QGraph – Outputs: List(/dry_run_gatk/ExampleCountReads-1.out) DEBUG 11:06:38,677 QGraph – Done+: List(/dry_run_gatk/.ExampleCountReads-1.out.done) DEBUG 11:06:38,677 QGraph – Done-: List() DEBUG 11:06:38,677 QGraph – CmdDir: /dry_run_gatk DEBUG 11:06:38,677 QGraph – Temp?: false DEBUG 11:06:38,678 QGraph – Prev: none (reset = false) INFO 11:06:38,678 QGraph – Log: /dry_run_gatk/ExampleCountReads-1.out INFO 11:06:38,685 QGraph – 0 Pend, 0 Run, 0 Fail, 1 Done INFO 11:06:38,687 QCommandLine – Script failed with 1 total jobs INFO 11:06:38,687 QCommandLine – Writing final jobs report… INFO 11:06:38,687 QJobsReporter – Writing JobLogging GATKReport to file /dry_run_gatk/ExampleCountReads.jobreport.txt INFO 11:06:38,698 QJobsReporter – Plotting JobLogging GATKReport to file /dry_run_gatk/ExampleCountReads.jobreport.pdf DEBUG 11:06:38,709 RScriptExecutor – Executing: DEBUG 11:06:38,709 RScriptExecutor – Rscript DEBUG 11:06:38,709 RScriptExecutor – -e DEBUG 11:06:38,709 RScriptExecutor – tempLibDir = ‘/dry_run_gatk/tmp/Rlib.4673101731368374405’;install.packages(pkgs=c(‘/dry_run_gatk/tmp/gsalib.tar.6822779882938808174.gz’), lib=tempLibDir, repos=NULL, type=‘source’, INSTALL_opts=c(‘—no-libs’, ‘—no-data’, ‘—no-help’, ‘—no-demo’, ‘—no-exec’));library(‘gsalib’, lib.loc=tempLibDir);source(‘/dry_run_gatk/tmp/queueJobReport.6906968465526462577.R’); DEBUG 11:06:38,710 RScriptExecutor – /dry_run_gatk/ExampleCountReads.jobreport.txt DEBUG 11:06:38,710 RScriptExecutor – /dry_run_gatk/ExampleCountReads.jobreport.pdf * installing source package âgsalibâ … ** Creating default NAMESPACE file ** R ** preparing package for lazy loading ** building package indices ** testing if installed package can be loaded * DONE (gsalib) Loading required package: methods Loading required package: gtools Loading required package: gdata gdata: read.xls support for ‘XLS’ (Excel 97-2004) files ENABLED. gdata: read.xls support for ‘XLSX’ (Excel 2007+) files ENABLED. Attaching package: âgdataâ The following object(s) are masked from âpackage:statsâ: nobs The following object(s) are masked from âpackage:utilsâ: object.size Loading required package: caTools Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Loading required package: MASS Attaching package: âgplotsâ The following object(s) are masked from âpackage:statsâ: lowess Loading required package: plyr Attaching package: âreshapeâ The following object(s) are masked from âpackage:plyrâ: rename, round_any [1] “Report“ [1] “Project : /dry_run_gatk/ExampleCountReads.jobreport.txt“ Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) : argument 1 is not a vector Calls: source … withVisible -> eval -> eval -> plotJobsGantt -> order Execution halted DEBUG 11:06:42,668 RScriptExecutor – Result: 1 WARN 11:06:42,668 RScriptExecutor – RScript exited with 1 DEBUG 11:06:42,674 IOUtils – Deleted /dry_run_gatk/tmp/Q-Classes-5894914949320226077
It doesn’t seem to be complaining about ‘gsalib’ in particular but the objects masked from the packages seem a bit odd. The failure seems to be in plotJobsGantt but I’m not sure if its the app itself or something upstream that is causing the failure.
Thanks so much for helping us debug this!
Cheers!
Morris
From Geraldine_VdAuwera on 2013-02-11
Hi Morris,
OK, your analysis job definitely executed correctly. What is screwing up is just Queue’s reporting about the job(s) that it ran, which is annoying but not of real importance. I think the failure may be linked to a bug in the reporting system which we’ve fixed in our development version. You can safely ignore this error for now; if it persists in the next version (2.4, estimated for release next week) let us know in this thread.
From chukhman on 2013-02-11
The same error occurs both with the 2.3.9 tarball as well as the version on github. Is the dev version different thatn the github version?
From Geraldine_VdAuwera on 2013-02-12
That’s correct, the dev version is different and is currently not available to the public. The github version is the last stable version we released, and is the same thing as the tarball. We’re in the process of changing our release workflow and may in the near future start providing nightly builds of the dev source; but right now that’s just not possible, sorry.
From chukhman on 2013-02-22
Has 2.4 been released yet? The downloads page still links to 2.3.9.
From Geraldine_VdAuwera on 2013-02-22
Not yet — we’re planning on releasing it Monday if all goes well.
From Alessandro on 2013-04-17
Hi Geraldine, I have a problem running of the "dry run" pre-analysis that you suggest. I've read the comments above, but none seemed to help my case, so I post the command line that I've used and the error...thanks in advance!!!
java -Xmx10g -jar /path/directory/2.4-9/Queue.jar --tempdirectory /path/directory/tmpprocesses/ -S ExampleCountReads.scala -R /path/directory/referencesortednormalized.fasta -I input.bam
INFO 11:41:27,268 QScriptManager - Compiling 1 QScript ERROR 11:41:27,274 QScriptManager - IO error while decoding ExampleCountReads.scala with UTF-8 Please try specifying another one using the -encoding option ERROR 11:41:27,275 QScriptManager - one error found
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
org.broadinstitute.sting.queue.QException: Compile of ExampleCountReads.scala failed with 1 error at org.broadinstitute.sting.queue.QScriptManager.loadScripts(QScriptManager.scala:71) at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:95) at org.broadinstitute.sting.queue.QCommandLine.getArgumentSources(QCommandLine.scala:227) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:202) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.4-9-g532efad):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Compile of ExampleCountReads.scala failed with 1 error
ERROR ------------------------------------------------------------------------------------------
INFO 11:41:27,348 QCommandLine - Shutting down jobs. Please wait...
From Geraldine_VdAuwera on 2013-04-17
Hi Alessandro, this is actually just telling you that it didn’t find the scala script you specified. Unlike the regular GATK commands where you just give the tool name, with Queue scripts you need to provide the full path to the script relative to your working directory.
From Alessandro on 2013-04-18
Fantastic, dry run successfully executed! Thank you so much!
From blueskypy on 2013-05-06
hi, I’m getting the following error. Could someone help me? Thanks a lot!
[usnee1-lph001-n066 42] ~ $ ls R script seqs test [14:04 0.04] [usnee1-lph001-n066 43] ~ $ java -Djava.io.tmpdir=tmp -jar $queue_jar -S ./seqs/softwares/Queue-2.5-2/resources/ExampleCountReads.scala -R /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta -I /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam -l DEBUG -run INFO 14:05:29,297 QScriptManager – Compiling 1 QScript DEBUG 14:05:29,298 QScriptManager – Compilation directory: /site/ne/home/cuiji01/tmp/Q-Classes-568453805836268123 INFO 14:05:32,089 QScriptManager – Compilation complete INFO 14:05:32,193 HelpFormatter – ——————————————————————————————————— INFO 14:05:32,193 HelpFormatter – Queue v2.5-2-gf57256b, Compiled 2013/05/01 09:29:04 INFO 14:05:32,193 HelpFormatter – Copyright © 2012 The Broad Institute INFO 14:05:32,193 HelpFormatter – For support and documentation go to http://www.broadinstitute.org/gatk DEBUG 14:05:32,194 HelpFormatter – Current directory: /site/ne/home/cuiji01 INFO 14:05:32,194 HelpFormatter – Program Args: -S ./seqs/softwares/Queue-2.5-2/resources/ExampleCountReads.scala -R /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta -I /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam -l DEBUG -run INFO 14:05:32,194 HelpFormatter – Date/Time: 2013/05/06 14:05:32 INFO 14:05:32,194 HelpFormatter – ——————————————————————————————————— INFO 14:05:32,194 HelpFormatter – ——————————————————————————————————— INFO 14:05:32,202 QCommandLine – Scripting ExampleCountReads DEBUG 14:05:32,259 QGraph – adding QNode: 0 INFO 14:05:32,268 QCommandLine – Added 1 functions INFO 14:05:32,269 QGraph – Generating graph. INFO 14:05:32,279 QGraph – Running jobs. INFO 14:05:32,281 QGraph – ———- INFO 14:05:32,289 QGraph – Done: ‘java’ ‘-Xmx1024m’ ‘-XX:+UseParallelOldGC’ ‘-XX:ParallelGCThreads=4’ ‘-XX:GCTimeLimit=50’ ‘-XX:GCHeapFreeLimit=10’ ‘-Djava.io.tmpdir=/site/ne/home/cuiji01/tmp’ ‘-cp’ ‘/site/ne/home/cuiji01/seqs/softwares/Queue-2.5-2/Queue.jar’ ‘org.broadinstitute.sting.gatk.CommandLineGATK’ ‘-T’ ‘CountReads’ ‘-I’ ‘/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam’ ‘-R’ ‘/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta‘ DEBUG 14:05:32,289 QGraph – Inputs: List(/site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bai, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleBAM.bam.bai, /site/ne/app/x86_64/gatk/v2.4.9/resources/exampleFASTA.fasta) DEBUG 14:05:32,289 QGraph – Outputs: List(/site/ne/home/cuiji01/ExampleCountReads-1.out) DEBUG 14:05:32,290 QGraph – Done+: List(/site/ne/home/cuiji01/.ExampleCountReads-1.out.done) DEBUG 14:05:32,290 QGraph – Done-: List() DEBUG 14:05:32,290 QGraph – CmdDir: /site/ne/home/cuiji01 DEBUG 14:05:32,290 QGraph – Temp?: false DEBUG 14:05:32,290 QGraph – Prev: none (reset = false) INFO 14:05:32,290 QGraph – Log: /site/ne/home/cuiji01/ExampleCountReads-1.out INFO 14:05:32,295 QGraph – 0 Pend, 0 Run, 0 Fail, 1 Done INFO 14:05:32,296 QCommandLine – Writing final jobs report… INFO 14:05:32,296 QJobsReporter – Writing JobLogging GATKReport to file /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt INFO 14:05:32,310 QJobsReporter – Plotting JobLogging GATKReport to file /site/ne/home/cuiji01/ExampleCountReads.jobreport.pdf DEBUG 14:05:32,344 RScriptExecutor – Executing: DEBUG 14:05:32,345 RScriptExecutor – Rscript DEBUG 14:05:32,345 RScriptExecutor – -e DEBUG 14:05:32,345 RScriptExecutor – tempLibDir = ‘/site/ne/home/cuiji01/tmp/Rlib.8304889352133617132’;install.packages(pkgs=c(‘/site/ne/home/cuiji01/tmp/RlibSources.3490673511527363174/gsalib’), lib=tempLibDir, repos=NULL, type=‘source’, INSTALL_opts=c(‘—no-libs’, ‘—no-data’, ‘—no-help’, ‘—no-demo’, ‘—no-exec’));library(‘gsalib’, lib.loc=tempLibDir);source(‘/site/ne/home/cuiji01/tmp/queueJobReport.2164983308823639078.R’); DEBUG 14:05:32,345 RScriptExecutor – /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt DEBUG 14:05:32,345 RScriptExecutor – /site/ne/home/cuiji01/ExampleCountReads.jobreport.pdf * installing source package âgsalibâ … ** Creating default NAMESPACE file ** R ** preparing package for lazy loading ** building package indices … ** testing if installed package can be loaded * DONE (gsalib) Loading required package: methods Loading required package: gtools Loading required package: gdata gdata: read.xls support for ‘XLS’ (Excel 97-2004) files ENABLED. gdata: read.xls support for ‘XLSX’ (Excel 2007+) files ENABLED. Attaching package: âgdataâ The following object(s) are masked from âpackage:statsâ: nobs The following object(s) are masked from âpackage:utilsâ: object.size Loading required package: caTools Loading required package: bitops Loading required package: grid Loading required package: KernSmooth KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 Attaching package: âgplotsâ The following object(s) are masked from âpackage:statsâ: lowess Loading required package: plyr Attaching package: âreshapeâ The following object(s) are masked from âpackage:plyrâ: rename, round_any [1] “Report“ [1] “Project : /site/ne/home/cuiji01/ExampleCountReads.jobreport.txt“ Error in order(allJobs$analysisName, allJobs$startTime, decreasing = T) : argument 1 is not a vector Calls: source … eval.with.vis -> eval.with.vis -> plotJobsGantt -> order Execution halted DEBUG 14:05:39,893 RScriptExecutor – Result: 1 WARN 14:05:39,894 RScriptExecutor – RScript exited with 1 INFO 14:05:39,930 QCommandLine – Script completed successfully with 1 total jobs DEBUG 14:05:39,953 IOUtils – Deleted /site/ne/home/cuiji01/tmp/Q-Classes-568453805836268123 [14:05 0.32] [usnee1-lph001-n066 44] ~ $ ls ExampleCountReads.jobreport.pdf ExampleCountReads.jobreport.txt R script seqs test tmp
From Geraldine_VdAuwera on 2013-05-06
Hi @blueskypy,
We’ve seen this error from another user recently — it looks like there’s a software version issue that is affecting the generation of the job report plots. Unfortunately we don’t have the resources to track down the exact issue right now, sorry. On the bright side you can ignore the rscript error, since it’s not an issue with the Queue run, it’s just the plot that summarizes the run info.
From blueskypy on 2013-05-07
hi, Geraldine,
Thanks for the help! At another thread, a user suggested the error was caused by outdated version of ggplot2. So I updated ggplot2, but still get the error. The file ExampleCountReads-1.out was not produced either, could you help me to find the reason?
Thanks a lot!
From Geraldine_VdAuwera on 2013-05-07
Hi @blueskypy,
Based on the output you posted earlier, the file should be there : `/site/ne/home/cuiji01/ExampleCountReads-1.out`. Is it not the case? Do you get a different “Outputs” line in your second run than in your first? Any error messages?
From blueskypy on 2013-05-07
before I run the Queue:
[usnee1-lph001-n066 42] ~ $ ls
R script seqs test
After
[usnee1-lph001-n066 44] ~ $ ls
ExampleCountReads.jobreport.pdf ExampleCountReads.jobreport.txt R script seqs test tmp
From Geraldine_VdAuwera on 2013-05-07
Can you list hidden files to see if there is a `.ExampleCountReads-1.out.done` file there?
From blueskypy on 2013-05-07
that’s right! it’s there but it’s empty!
From blueskypy on 2013-05-07
the ExampleCountReads.jobreport.pdf cannot be opened either, the error says there is no page. Also very little content in the 3rd file:
[usnee1-lph001-n066 74] ~ $ more ExampleCountReads.jobreport.txt #:GATKReport.v1.1:0
From Geraldine_VdAuwera on 2013-05-07
That file tells Queue that the job has already been successfully completed, and it doesn’t need to do it again. This is useful for bigger jobs, to be able to resume after a failure without redoing all the work that has already successfully completed. You can either delete the .done file, or add `-startFromScratch` to the Queue command line to override it.
From blueskypy on 2013-05-07
hi, Geraldine,
Good news! I deleted the .ExampleCountReads-1.out.done and re-run the Queue. And this time everything works fine and the output looks correct as well.
So I think the error maybe indeed was due to outdated ggplot2. But in my previous runs, even if I updated ggplot2, I didn’t delete the old ‘done’ file so the Queue didn’t really run and I still got the same error msg. Is my understanding right?
From Geraldine_VdAuwera on 2013-05-07
Great, I’m glad to hear that! Good to know about the ggplot2 version, thanks for reporting your solution.
Yes, I believe that’s correct — the “failure” of your second run was due to the leftover .done file telling Queue not to do anything. This generated an empty table in the job report (since nothing was done) so you got the same error (rscript couldn’t run) for a slightly different reason.
From blueskypy on 2013-05-07
Thanks Geraldine for your help! You may want to provide the solution to this thread: http://gatk.vanillaforums.com/discussion/2467/install-gsalib
I was going to post the suggestion but somehow have a problem to login using google on that page.
From Geraldine_VdAuwera on 2013-05-07
Done, thanks for pointing it out! FYI the problem you encountered on that page is that it uses an older URL format for the forum, which affects some of our older articles; you should be able to access it normally by changing “https” to “http” in the link.
From blueskypy on 2013-05-07
hi, Geraldine,
I wonder if I can ask another question. Is the ‘-jobRunner GridEngine’ option same as using the following?
`bsub java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -run`
From Geraldine_VdAuwera on 2013-05-07
Hi @blueskypy,
That option is used to specify which job runner your cluster/server uses for job management. I can’t tell you the details of the syntax used with GridEngine as that’s not what we use in-house, but we do have other users around who use it — hopefully they will jump in to contribute their experience.
From blueskypy on 2013-05-31
how to specify an output dir for CountReads?
From Geraldine_VdAuwera on 2013-05-31
If you’re running CountReads on its own, it will always output the result to stdout. If you’re running it via Queue, it depends on how the scala script is set up. In the example script it’s predetermined, but you can either change the hardcoded default, or add an argument to the script to set it from the command line.
From adouble2 on 2013-07-04
Hi,
I think the link at the bottom “Queue with Grid Engine” should point to:
[http://www.broadinstitute.org/gatk/guide/article?id=1313](http://www.broadinstitute.org/gatk/guide/article?id=1313)
and I think the QFunction link and Command Line options should both now point to:
[broadinstitute.org/gatk/guide/article?id=1311](http://www.broadinstitute.org/gatk/guide/article?id=1311)
From Geraldine_VdAuwera on 2013-07-08
@adouble2, that’s correct. Thanks for reporting the missing links, I’ve added them to the document.
From AdrianVeres on 2013-07-10
I’m getting an error that is preventing from submitting LSF jobs using Queue.
ERROR 17:15:19,132 FunctionEdge – Error: echo hello world scala.MatchError: M (of class java.lang.String) at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner$.unitDivisor(Lsf706JobRunner.scala:409) at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner$.org$broadinstitute$sting$queue$engine$lsf$Lsf706JobRunner$$convertUnits(Lsf706JobRunner.scala:420) at org.broadinstitute.sting.queue.engine.lsf.Lsf706JobRunner.start(Lsf706JobRunner.scala:99) at org.broadinstitute.sting.queue.engine.FunctionEdge.start(FunctionEdge.scala:84) at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:434) at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:156) at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:171) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
This happens using [HelloWorld.scala from the GitHub repo](https://github.com/broadgsa/gatk/blob/master/public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/HelloWorld.scala), but also running any other script with the LSF JobRunner.
This error does not occur during dry runs, or in runs using the shell JobRunner. Using -bsub, this error occurs wether I specify reasonable -jobQueue, -memLimit, -resMemLimit, -resMemReq or not. I am on an LSF 8.0.1 cluster, as shown by `lsid`.
Platform LSF 8.0.1, Jun 13 2011 Copyright 1992-2011 Platform Computing Corporation My cluster name is cmucluster My master name is cmulsf
I am using Queue 2.6-4, this is the context prior to the error.
[x-removed]$ java -jar Queue-2.6-4-g3e5ff60/Queue.jar -S queue/HelloWorld.scala -l DEBUG -jobQueue short -bsub -startFromScratch -run INFO 17:15:15,244 QScriptManager – Compiling 1 QScript DEBUG 17:15:15,245 QScriptManager – Compilation directory: /tmp/Q-Classes-2975930027453708515 INFO 17:15:18,723 QScriptManager – Compilation complete INFO 17:15:18,778 HelpFormatter – ——————————————————————————————————— INFO 17:15:18,779 HelpFormatter – Queue v2.6-4-g3e5ff60, Compiled 2013/06/24 14:50:50 INFO 17:15:18,779 HelpFormatter – Copyright © 2012 The Broad Institute INFO 17:15:18,779 HelpFormatter – For support and documentation go to http://www.broadinstitute.org/gatk DEBUG 17:15:18,779 HelpFormatter – Current directory: /x/x INFO 17:15:18,779 HelpFormatter – Program Args: -S queue/HelloWorld.scala -l DEBUG -jobQueue short -bsub -startFromScratch -run INFO 17:15:18,779 HelpFormatter – Date/Time: 2013/07/10 17:15:18 INFO 17:15:18,779 HelpFormatter – ——————————————————————————————————— INFO 17:15:18,780 HelpFormatter – ——————————————————————————————————— INFO 17:15:18,785 QCommandLine – Scripting HelloWorld DEBUG 17:15:18,796 QGraph – adding QNode: 0 INFO 17:15:18,802 QCommandLine – Added 1 functions INFO 17:15:18,803 QGraph – Generating graph. INFO 17:15:18,809 QGraph – Running jobs. INFO 17:15:18,810 QGraph – Removing outputs from previous runs. DEBUG 17:15:18,817 IOUtils – Deleted /x/x/.HelloWorld-1.out.fail DEBUG 17:15:18,967 FunctionEdge – Starting: /x/x > echo hello world INFO 17:15:18,968 FunctionEdge – Output written to /x/x/HelloWorld-1.out DEBUG 17:15:19,089 IOUtils – Deleted /x/x/HelloWorld-1.out DEBUG 17:15:19,124 IOUtils – Deleted /x/x/.queue/tmp/.exec7638084941875402342 ERROR 17:15:19,132 FunctionEdge – Error: echo hello world
From galeano on 2013-07-10
Hi, Running this tutorial I have had the same problem that Olga reported. Some one have found a solution?
thanks, Carlos
@omedvedeva said: I can't perform a first dry run on Windows 7 with Queue 2.2.5. The installation seems to be correct since --help option works. It looks like it can't find the tmp directory that it creates at the correct location. The same problem occurs with QueueLite too. What am I missing? In the stack trace below fasta, bam and scala files were in the working directory:
C:\GATK\Queue-2.2-5-g3bf5e3f>java -Djava.io.tmpdir=tmp -jar Queue.jar -S Example CountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam ERROR 10:17:34,493 QScriptManager - \GATK\Queue-2.2-5-g3bf5e3f\tmp\Q-Classes-80 75780960630530304 does not exist or is not a directory INFO 10:17:35,965 QScriptManager - Compiling 1 QScript INFO 10:17:40,538 QScriptManager - Compilation complete
ERROR stack trace
org.broadinstitute.sting.commandline.InvalidArgumentException: Argument with name 'R' isn't defined. at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:303) at org.broadinstitute.sting.commandline.ParsingEngine.validate(ParsingEn gine.java:276) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:204) at org.broadinstitute.sting.commandline.CommandLineProgram.start(Command LineProgram.java:146) at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala: 62) at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
ERROR A GATK RUNTIME ERROR has occurred (version 2.2-5-g3bf5e3f):
...
ERROR MESSAGE: Argument with name 'R' isn't defined.
ERROR --------------------------------------------------------------------
Thank you, Olga.
From Geraldine_VdAuwera on 2013-07-17
Hi AdrianVeres and
galeano, sorry to respond so late. Unfortunately we’re currently not able to provide support for Queue issues at the moment. The software is provided as is, and if you have different system configurations it’s up to you to get it to work. You may need to ask for help from your IT department. Good luck!
From Philipp79 on 2013-10-22
Hi Geraldine, I am trying to run GenomeStrip but got stuck at the second step, the "SVPreprocess" Queue script. The error prevents the compilation of the local SVQScript.q, SVPreprocess.q. I assume this is an R issue: running R version 3.0.2 (hence, an older one). It may have to do with the address specified in my script: $java -Xmx8g -cp $home/Queue/Queue.jar:$home/svtoolkit/SVToolkit.jar:$gatk_dir/GenomeAnalysisTK.jar org.broadinstitute.sting.queue.QCommandLine \ ...
While the error refers to: org.broadinstitute.sv.queue.ComputeVCFPartitions
I added the R-package "coin" already as suggested by a colleague of yours regarding a different compilation issue.
Thanks for your help.
The whole error is as follows:
ERROR ------------------------------------------------------------------------------------------
ERROR stack trace
ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.7-4-g6f46d11):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Compile of /home/user/NGS2013exp/svtoolkit/qscript/SVPreprocess.q, /home/user/NGS2013exp/svtoolkit/qscript/SVQScript.q failed with 3 errors
ERROR ------------------------------------------------------------------------------------------
From Geraldine_VdAuwera on 2013-10-23
Hi @Philipp79,
This looks more like a GATK/Queue/GenomeStrip version issue (you may be using versions of the different jars that are not compatible together) than anything to do with R. But this intro-level article’s comments section is really not the right place to discuss this problem. Please post this question as a separate discussion, preferably in the GenomeStrip section of the forum.
From wallysb01 on 2013-10-25
Hi everyone,
I am trying to add the best practices options to the haplotypecaller scala script with no luck. I have zero scala experience but from basic pattern matching I’ve tried these additions:
> package org.broadinstitute.sting.queue.qscripts.examples
>
> import org.broadinstitute.sting.queue.QScript
> import org.broadinstitute.sting.queue.extensions.gatk._
>
> /**
> * An example building on the intro ExampleCountReads.scala.
> * Runs an INCOMPLETE variant calling pipeline with just the UnifiedGenotyper, VariantEval and optional VariantFiltration.
> * For a complete description of the suggested for a variant calling pipeline see the latest version of the Best Practice Variant Detection document
> */
> class ExampleHaplotypeCaller extends QScript {
> // Create an alias ‘qscript’ to be able to access variables
> // in the ExampleHaplotypeCaller.
> // ‘qscript’ is now the same as ‘ExampleHaplotypeCaller.this‘
> qscript =>
>
>
> // Required arguments. All initialized to empty values.
>
>
Input(doc="The reference file for the bam files.", shortName="R") > var referenceFile: File = _ // _ is scala shorthand for null > >
Input(doc=“Bam file to genotype.”, shortName=“I”)
> var bamFile: File = _
>
> // The following arguments are all optional.
>
>
Input(doc="An optional file with a list of intervals to proccess.", shortName="L", required=false) > var intervals: File = _ > >
Argument(doc=“A optional list of filter names.”, shortName=“filter”, required=false)
> var filterNames: List[String] = Nil // Nil is an empty List, versus null which means a non-existent List.
>
>
Argument(doc="An optional list of filter expressions.", shortName="filterExpression", required=false) > var filterExpressions: List[String] = Nil > >
Argument(doc=“The minimum phred-scaled confidence threshold at which variants should be called”, fullName=“standard_min_confidence_threshold_for_emitting”, shortName=“stand_call_conf”, required=false)
> var standCallConf: Int = _
>
>
Argument(doc="The minimum phred-scaled confidence threshold at which variants should be emitted", fullName="standard_min_confidence_threshold_for_calling", shortName="stand_emit_conf", required=false) > var standEmitConf: Int = _ > >
Argument(doc=“Specifies how to determine the alternate alleles to use for genotyping (DISCOVERY|GENOTYPE_GIVEN_ALLELES)”, fullName=“genotyping_mode”, shortName=“gt_mode”, required=false)
> var gtMode: List[String] = Nil
>
> // This trait allows us set the variables below in one place,
> // and then reuse this trait on each CommandLineGATK function below.
> trait UnifiedGenotyperArguments extends CommandLineGATK {
> this.reference_sequence = qscript.referenceFile
> this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
> this.standCallConf = Int(qscript.standEmitConf)
> this.standCallConf = Int(qscript.standCallConf)
> this.gtMode = List(qscript.gtMode)
> // Set the memory limit to 8 gigabytes on each command.
> this.memoryLimit = 8
> }
Then I use the following command:
> java -Xmx12g -jar ~/tools/Queue-2.7-4-g6f46d11/Queue.jar -S ../../Queue-2.7-4-g6f46d11/resources/ExampleHaplotypeCaller.scala -R exampleFASTA.fasta -I exampleBAM.bam -stand_emit_conf 10 -stand_call_conf 30 -gt_mode DISCOVERY -jobRunner PbsEngine -startFromScratch -jobQueue batch -memLimit 4
And get the following error:
> INFO 02:00:26,722 QScriptManager – Compiling 1 QScript
> DEBUG 02:00:26,723 QScriptManager – Compilation directory: /tmp/Q-Classes-2057664968471242235
> ERROR 02:00:27,856 QScriptManager – ExampleHaplotypeCaller.scala:86: value standard_min_confidence_threshold_for_emitting is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments
> ERROR 02:00:27,859 QScriptManager – this.standard_min_confidence_threshold_for_emitting = qscript.standEmitConf
> ERROR 02:00:27,859 QScriptManager – ^
> ERROR 02:00:27,868 QScriptManager – ExampleHaplotypeCaller.scala:87: value standard_min_confidence_threshold_for_calling is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments
> ERROR 02:00:27,870 QScriptManager – this.standard_min_confidence_threshold_for_calling = qscript.standCallConf
> ERROR 02:00:27,870 QScriptManager – ^
> ERROR 02:00:27,878 QScriptManager – ExampleHaplotypeCaller.scala:88: value genotypeing_mode is not a member of ExampleHaplotypeCaller.this.UnifiedGenotyperArguments
> ERROR 02:00:27,880 QScriptManager – this.genotypeing_mode = qscript.gtMode
> ERROR 02:00:27,880 QScriptManager – ^
> ERROR 02:00:28,227 QScriptManager – three errors found
Any ideas?
Thanks for any help.
From Geraldine_VdAuwera on 2013-10-25
Hi there, you seem to have errors in the argument names (e.g. genotyp**e**ing_mode in misspelled). Are you using an IDE to develop your script? A good IDE will enable you to look up available argument names easily and reduce the chance of making such errors.
From wallysb01 on 2013-10-25
Ok, I fixed that spelling mistake. And reran, but get the same message. I am looking into scala IDEs. Do you have a favorite?
Also, do you know if this should work if I get the syntax right? I’m just trying to speed up HaplotypeCaller as its estimating a 17 day run time with just one 30x genome. For now I suppose 17 days isn’t so bad, but I need to eventually add several more genomes.
Thanks for the help
From Geraldine_VdAuwera on 2013-10-25
We use IntelliJ IDEA, it’s convenient for developing Java & scala together.
The script looks fine overall, assuming it includes the def script() section (which you didn’t include in what you posted).
Have a look at the slides of the workshop we held earlier this week; you can find a link in the announcements section, and we’ll have a full info page up later today. The presentations will walk you through understanding the important parts of a QScript and how to modify it to suit your needs.
From wallysb01 on 2013-10-25
it does have the def script(). I also tried working of another example that was very similar, but already had some modifications for the HaplotypeCaller. I’ve attached the full thing thus far if you want to check the def script() part. And I get the same errors with this script as well. So its got to something I’m missing.
I’ll take a look at those slides and keep checking back for the info page. Thanks for pointing those out.
From lennartkester on 2017-07-21
Hi Everyone,
I have an issue getting queue to run on our server. I’m using queue 3.7.
I can run the first test: java -Djava.io.tmpdir=tmp -jar Queue.jar -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam
However when I try to run it with -run it gives a java error.
Exception in thread “main” java.lang.UnsupportedClassVersionError: org/broadinstitute/gatk/engine/CommandLineGATK : Unsupported major.minor version 52.0
From what I’ve seen this means I don’t have the right java version and that I should have Java SE 8. My Java version is:
java version “1.8.0_45“
Java™ SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot™ 64-Bit Server VM (build 25.45-b02, mixed mode)
This is Java SE 8 right?
Do you have any idea what the problem might be?
Thanks a lot!
Lennart
From shlee on 2017-07-21
Hi @lennartkester,
I’m not familiar with Queue but let me see if I can help you. Do you get standard out lines that show which version of Java the tool itself is using, e.g.
```
INFO 12:16:45,163 HelpFormatter – The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 12:16:45,163 HelpFormatter – Copyright © 2010-2016 The Broad Institute
INFO 12:16:45,163 HelpFormatter – For support and documentation go to https://software.broadinstitute.org/gatk
INFO 12:16:45,163 HelpFormatter – [Fri Jul 21 12:16:45 EDT 2017] Executing on Mac OS X 10.11.6 ×86_64
INFO 12:16:45,163 HelpFormatter – Java HotSpot™ 64-Bit Server VM 1.8.0_111-b14
INFO 12:16:45,166 HelpFormatter – Program Args: -T RealignerTargetCreator -R /Users/shlee/Documents/ref/hg37_g1k_sans_hs37d5/human_g1k_v37.fasta -I 6484_snippet.bam -o rtc.intervals -U ALLOW_SEQ_DICT_INCOMPATIBILITY
INFO 12:16:45,170 HelpFormatter – Executing as shlee@WMCF9-CB5 on Mac OS X 10.11.6 ×86_64; Java HotSpot™ 64-Bit Server VM 1.8.0_111-b14.
```
Does this match the java version you get from `java -version`?