created by Geraldine_VdAuwera
on 2017-03-31
This feature of the Cromwell execution engine (also sometimes called job avoidance) allows FireCloud to make use of outputs from tasks that have already been run and avoid running them again. That is, the system can recognize that a task call in a workflow submission has already been run using the same command, docker image and inputs. If these conditions are verified, the output files from that previous run simply get copied over to the new submission folder in the workspace bucket, saving you time and money by skipping that amount of computation. Note that this does produce a real copy (duplicate) of the output files in question, on which you will incur storage costs.
1. Call caching is turned on by default in FireCloud
There is a checkbox in the Launch dialog that allows you to disable this feature if you want to force the system to re-run the computation.
2. Call caching identifies Docker images by their digest (aka hash), NOT their tag
This protects you from call caching against results produced by a different image version that was given the same tag (e.g. "latest"). The good news is that the system is able to check this under the hood, so you don't need to specify the hash explicitly in your workflow script. You can just use the human-readable tag, which is the tag
bit in my_repo/my_image:tag
.
3. Using NIO file streaming breaks call caching
NIO file streaming is a data transfer protocol that allows you to run analysis tasks on sections of files that are in cloud storage without having to copy the entire file to the local disk of the machine that's running the computation. This can save you quite a bit of money, as discussed here.
However right now (June '18) there is a limitation of the system where cached calls are not recognized as such for tasks that use NIO. This is because when you use NIO in a pipeline task, you provide their input file(s) as a String
type rather than a File
type -- that's what tells the system to stream the file by its URL rather than localize it to disk. This means that the String
path will be different compared to what would be the localized path, despite the contents being the same. And unfortunately, having different input file paths causes the system to deny call caching for that task.
This is going to be fixed soon in the Cromwell engine, which runs the pipelines inside FireCloud. The solution is that it will be possible to use File
type inputs for tasks that use NIO, instead of a String
type. As a result the paths will be the same, call caching will work, and as an added bonus the WDLs will be more portable.
Updated on 2018-06-08
From NawarDalila on 2018-11-14
Can one use the call-caching feature on HPC clusters?