created by Tiffany_at_Broad
on 2017-11-07
You can import metadata into your workspace's data model by either copying from an existing workspace or importing a file.
Copying from an existing workspace
Importing a file
You import metadata corresponding to entity type -- Participant, Sample, or Pair -- by uploading load files in tab-separated-value format, a type of text file (.tsv or .txt). Separate files must be used for uploading each entity type. The first line of each file must contain the appropriate field names in their respective column headers. See the individual entity entries for examples of load files.
Note that for each of the basic entities, the data model also supports set entities, which are essentially lists of the basic entity type:
In set load files, each line lists the membership of a non-set entity (e.g., participant) in a set (e.g., participant set). The first column contains the identifier of the set entity and the second column contains a key referencing a member of that set. For example, a load file for a participant set looks like this:
Note that multiple rows in a set load file may have the same set entity id (e.g. TCGA_COAD).
Order for uploading Load Files
Load files must be imported in a strict order due to references to other entities.
The order is as follows ("A > B" means entity type A must precede B in order of upload):
Uploading an Array of files or strings
You may be in the situation where you have multiple files or strings of metadata that belong to one participant, sample, pair, or sets of these. For example, say you have been given genotyping files in VCF format for a collection of samples, for a total of twenty-two files per sample set. For each sample set, you don’t want to create a new column in the data table for each file because that's time-consuming. You would also have to launch the analysis in FireCloud repeatedly to run on each file. Instead, you want to build a WDL that inputs an array of VCF files because you’d like your tools to run on each item in the array without manual intervention.
To get the array into your data model, you can write WDL code that will output a file of file paths or strings into an array format. This requires a file that contains a list of file paths or strings as the input. A task in your WDL can read the lines of the file, output it to your data model as an array, then you can use the method configuration to assign it to a workspace attribute (“workspace.X”) or an attribute of the participant, sample, pair, or set that you are running on (“this.X”).
Here are two examples that can be altered for your use case. In the example above, the input would be a file that has a list of VCF file paths, one per line using “gs://” format.
Example 1 has a command portion left blank so that you can manipulate the array if you desire. This WDL will copy your files to the virtual machine the task spins up, which makes sense if you are manipulating the array of files further. 50 GB disk size is to account for those files that are being copied to the virtual machine and should be changed for your use case. If you do not want to manipulate the array, see Example 2.
Example 1:
Example 1’s Method and Method Configuration are published in the Methods Repository.
workflow fof_usage_wf { File file_of_files call fof_usage_task { input: fof = file_of_files } output { Array[File] array_output = fof_usage_task.array_of_files } } task fof_usage_task { File fof Array[File] my_files=read_lines(fof) command { #do stuff with arrays below #.... } runtime { docker : "ubuntu:16.04" disks: "local-disk 50 HDD" memory: "2 GB" } output { Array[File] array_of_files = my_files } }
Example 2:
``` workflow fofusagewf { File fileoffiles Array[File] arrayoffiles = readlines(fileof_files)
output { Array[File] arrayoutput = arrayof_files } } ```
Importing arrays into your data model directly with a TSV is not currently available. We are working on functionality to make this easier to do in the web interface.
Updated on 2018-04-27
From tmajarian on 2017-11-08
If you are not manipulating the files, can you avoid copying files to the vm? The wdl would be:
```
workflow fof_usage_wf { File file_of_files Array[String] array_of_files = read_lines(file_of_files)
output { Array[String] array_output = array_of_files } } ```
Since the Array[String] can be coerced to Array[File] within any subsequent method call, this should work and avoid using extra disk space/runtime for copying.
From dheiman on 2018-06-12
I feel like this tutorial is missing a lot of the metadata-loading information covered in the [FireCloud Basics page](https://gatkforums.broadinstitute.org/firecloud/discussion/6822/firecloud-basics “FireCloud Basics page”) (e.g. update loadfiles for loading custom set-level attributes) while as a tutorial I expected it to go into greater depth and walk me through the entire process.
From Tiffany_at_Broad on 2018-06-26
Hi @dheiman are you talking about linking or incorporating info also covered in this tutorial here? https://gatkforums.broadinstitute.org/firecloud/discussion/10892/howto-overwrite-and-delete-data-from-the-data-model#latest