Two convenience facilities exist for declaring Hail jobs in Loam. These require valid loamstream.googlecloud and loamstream.googlecloud.hail sections in loamstream.conf. (See Configuration.)
Hail is driven by Python scripts. If you have a Python driver script already, you can use the hail"..." interpolator:
hail"some-driver-script.py"As usual, interpolating stores and other variables is possible:
val projectId = "foo"val inputVcf = store.at(uri("gs://foo/bar/baz.vcf"))val outputVds = store.at(uri("gs://foo/bar/baz.vds")) hail"""some_driver_file.py --vcf-in $projectId ${inputVcf} --vds-out ${outputVds}"""This Tool will invoke the gcloud program (part of the Google Cloud SDK) to submit a Hail job to Google Cloud Dataproc. Boilerplate params will be prepended to the contents of the hail"..." string, and $-variables will be interpolated. This will result in a command line something like:
gcloud dataproc jobs submit pyspark \ --cluster=<cluster id> \ --files=<URI of Hail jar> \ --py-files=<URI of Hail zip file> --properties="spark.driver.extraClassPath=<URI of Hail jar>,spark.executor.extraClassPath=<URI of Hail jar> \ some_driver_file.py \ --vcf-in foo gs://foo/bar/baz.vcf \ --vds-out gs://foo/bar/baz.vds being run, with the cluster id, Hail jar URI, and Hail zip URI coming from loamstream.conf. (See Configuration.)
If you don't have a driver script in a separate file, it's possible to specify one inline:
pyhail"""#driver script code #... #more Python code""" This will result in a similar command line compared to the previous tool, but with a driver script with a machine-generated name (containing the interpolated contents of the pyhail"..." string) instead of the manually-specified some_driver_file.py.
Files inputs needed by Hail jobs running at Google need to live in Google Cloud Storage buckets. Similarly, Hail jobs running at Google can only write output to files in Google Cloud Storage buckets, not files on the Broad FS (or anywhere else that's remote from the perspective of Google Cloud). It's currently necessary to manually encode the copying of files to and from Google. Fortunately, there are some helpers in Loam:
val vcfOnLocalFs = store.at(path("/foo/bar/baz.vcf"))val vcfInGoogleBucket = store.at(uri("gs://some-bucket/bar/baz.vcf"))//copy vcf *to* GooglegoogleCopy(vcfOnLocalFs, vcfInGoogleBucket)//Copy vcf *from* GooglegoogleCopy(vcfInGoogleBucket, vcfOnLocalFs)googleCopy registers a command-line Tool that represents invoking gsutil (from the Google Cloud SDK) to copy the requested file with the first store param as input and the second as output.