created by FireCloud_Team
on 2018-08-30
In this document we'll go over some basic strategies to investigate failed workflows on FireCloud. This isn’t a guide for solving all errors but a doc to instruct new users on how to diagnose failed submissions. Descriptions of more complicated errors are always welcomed on the FireCloud Forum where our team is happy to help.
At this point we will assume that you have a workspace setup with a data model and method configuration loaded. You’ve launched your method, but your submission has failed. Don’t despair! There is some information you can gather that will be helpful in getting your method up and running. In your workspace, go to the Monitor tab and click View
to visit the submission that failed. Then click View
again in the data entity column to see details about the submission.
Click “Show” located to the the right of Failures, this will display any error messages. The message listed under Failures
isn't always short and sweet, and, if interpreted incorrectly, will lead you down the wrong debugging path. Instead, use the message to identify and investigate which task failed. In this case, the failed task is HelloGATK.HaplotypeCallerGVCF .
Click Show
on the task that failed to gain access to three very useful files for debugging: stderr, stdout, JES log. These files are generated by Cromwell when executing any task and are placed in the task's folder along with its output. In FireCloud we add quick links to these files in the Monitor tab to make troubleshooting easier. - Standard Out (stdout): A file containing log outputs generated by commands in the task. Not all commands generate log outputs and so this file may be empty. - Standard Error (stderr): A file containing error messages produced by the commands executed in the task. A good place to start for a failed task. - JES log: A log file tracking the events that occurred in performing the task such as downloading docker, localizing files, etc.. Occasionally a workflow will fail without a stderr and stdout files, leaving you with only a JES log. More on this on the next section.
Many common task level errors are indicated in the stderr file. Click on the link to the stderr.log file, in this example it would be Haplotypecaller_GVCF-stderr.log
and a window will appear giving you a glimpse of the file.
Here we see an error produced by the HaplotypeCaller command in our task. The message indicates the index file for the FASTA reference does not exist. Hmm, its seems there's something wrong with the FASTA index we provided. Click Done
to go back to the previous screen. Now we’ll check the inputs that were provided to the task by clicking Show
for the Inputs.
Ah-ha, the reference index file we provided (InputBamIndex) is the index file for our sample (NA12878.bai), but instead it should be the index for reference hg38 (Homosapiensassembly38.fasta.fai). After correcting my method configuration to use the right input index, my workflow passes!
Often the JESLog is difficult to decipher so its better to proceed to the other log files. However, in some cases your submitted job will fail with no stderr or stdout files. In these cases you’ll have to suck it up and unravel the meaning behind the JESLog messages. Below we’ve provided some common JESLog errors and their possible meaning as aid. There isn’t a solution for all of them so feel free to post your error on the FireCloud forum so the team could help you through the message. - PAPI (JES) Error 10 Message 15
- The provided memory or disk space isn’t sufficient to load your input files and Docker container, thus the task instance exited abruptly. Increasing the disk space should avoid the error message. - PAPI (JES) Error 10 Message 14
- This error is associated with preemptible VMs, but we've observed that it occasionally applies when non-preemptible machines fail. Retry the workflow. If you see this particular error on the same task repeatedly, we recommend using the runtime attribute maxRetries
to retry transient failures.
PAPI Error 10 Message 13
PAPI Error 5: Message 10
PAPI Error 2
Cannot find credentials for RawlsUser(RawlsUserSubjectId(******),RawlsUserEmail(******))
Refresh your browser window, and you will see a yellow banner at the top of the page that says "Your offline credentials are missing or out-of-date. Your workflows may not run correctly until they have been refreshed. Refresh now...", click the link to "Refresh now..." and follow the prompts. This will update your credentials in the system and should make this error message go away. This prompt is needed in order to maintain compliance with certain security standards required for hosting sensitive data.Additional Links: FireCloud Forum Quick Start Guide FireCloud Tutorials FireCloud FAQ
Updated on 2018-08-31