JGI Data Portal API Tutorial

JGI Data Portal Application Programming Interface

Experiment with the API in the Interactive Environment

Primary Use Cases for Using the Search Endpoint

Use Case for the Download Files Endpoint

Introduction to the JDP API

Using the API: A High-Level Workflow

Parse the JSON Payload

Retrieve Your Session Token

Restore Files via API

Restore: Payload Examples

Check File Restoration Status

Download Files via the API

Download: Payload Examples

Example: Copy an API Call from the Browser

Advanced Queries: Search for values in a specific field

Stay in touch

Contact Us

Please let us know if you have further questions, if we can improve this section, or if you have requests/suggestions by contacting us at jdp@lbl.gov.

Join our Mailing Lists

We offer two mailing lists to keep you up to date on new features and documentation.

Browser Interface Mailing List
- Announcements regarding enhancements that will be reflected in (or documentation related to) the browser.
API Mailing List
- For those who write code, announcements regarding enhancements that will be reflected in (or documentation related to) the API.

How to join our mailing lists.

JGI Data Portal Application Programming Interface

With the JGI Data Portal (JDP) Application Programming Interface (API), you can automate your downloads or batch them programmatically however you want.

The purpose of this tutorial is to provide a starting point for those who are new to the API. In this tutorial, you will find an overview of several use cases, as well as specific examples. We hope this saves you time and answers most of your initial questions.

Experiment with the API in the Interactive Environment

If you want to dive right into the API, our API Documentation provides an interactive environment for constructing and testing API calls:

JGI Data Portal API Documentation and Interactive Environment

However, before jumping directly into the above interactive documentation, please take a few moments to peruse the rest of this tutorial to learn the basics of how to search, filter, and download data with our API.

Primary Use Cases for Using the Search Endpoint

We have designed our search endpoint to cover two primary use cases:

You want to download many organisms and don't want to interact directly with pages in the browser interface.
You want to run automated searches.

Use Case for the Download Files Endpoint

The Download API has been designed primarily to cover the case where you want to download data to a remote server.

Option 1: You can use our interactive API Documentation page to construct your download API call.

Option 2: You can use the browser interface to construct your list of files to download (search --> add to cart --> view cart), and select "Command Line Download" while in the cart. Copy this command and use this to download your files to a remote host.

Introduction to the JDP API

Our search engine is backed by Elastic Search, and we index a large number of metadata fields for our files. However, since our data is over 13 Petabytes in size, the files themselves are archived for long-term storage and need to be queued for retrieval prior to being downloaded. Retrieval from our archive typically takes less than an hour (but can take up to one night), and is dependent on the number of concurrent requests (from all users).

Click to enlarge the image

What data is available via the JDP API?

The JDP API makes JGI's public data available for download.

Current limitations are that data needs to be part of one of the following scientific programs: microbial, metagenome, fungal, algal or plant.

Follow the links to find public data from the following scientific programs: secondary metabolites, metabolomics, synthetic biology.

When do I need Authorization?

In order to restore files or download files, you will need to provide your authorized session token. You can get this by logging in to JDP in your browser and clicking on "Copy My Session Token" in the avatar dropdown menu of your web browser.

Important Note: Filtering Search Results

Before specifying any file filters or organism filters, you must first conduct an initial search (e.g., for an organism) and then examine the results to determine the available filter values.

This is because our filter values are dynamic and based on what you search for. Although our filter categories (parameters) are always the same, the available values in a filter category will vary based on the metadata in the files that match the initial search term.

Therefore, an initial search should be performed with no filters specified. The server response should then be examined to determine which filter values are available. Finally, the search can be repeated with file filters and/or organism filters applied.

For a more detailed procedure, please refer to the High-Level Workflow (Using API) section directly below.

Using the API: A High-Level Workflow

Construct your initial search query - eg, search for an organism (e coli)
1. Option 1: Browser Interface
  1. Submit your initial query in the browser interface
  2. Filter your data to focus on the files you care about
  3. Copy the Search API call by pressing "API Search Query Link"
2. Option 2: Interactive API Environment
  1. Select the endpoint you want to use:
    1. search (for general search)
    2. img_file_list (search for IMG (microbial & metagenome) files only)
    3. mycocosm_file_list (search for Mycocosm (fungal) files only)
    4. phytozome_file_list (search for Phytozome (algal) files only)
  2. Construct and submit your initial query using the Interactive Environment
  3. Review the available filters (keys) and values in the "facets" section of the JSON output
  4. Update your filters in the interactive environment as necessary.
Parse the JSON output to retrieve your file IDs.
1. See the Parse the JSON Payload section (directly below) for useful information.
(If necessary) Request that your files be restored using the request_archived_files endpoint.
Check the status of your restoration request(s) using the request_archived_files/requests endpoint.
Download your data using the collection of file IDs using the download_files endpoint.

We strongly recommend that you use Interactive Interface to construct and test your API calls. This is the best way to ensure that your queries will work.

You can view the set of available filter categories by visiting our API Documentation, where you can explore the API interactively.

Parse the JSON Payload

The JSON payload that you will receive is used by the JDP front-end as well as by API users. With this being the case, front-end information is included in the payload that will likely not be useful for API users. This section provides an orientation on important sections of the JSON payload and on sections that may be less useful (or could possibly cause confusion).

Example: https://files.jgi.doe.gov/search/?q=e+coli+BW25113+JBEI-FM002

Example with highlighted fields of interest: search payload

Pagination

The JSON payload provides pages of data. By default:

the first page of search results is returned
10 datasets are returned on that first page

You will need to either iterate through all of the pages to see all of the data that has been returned, or update the datasets_per_page parameter to view all of the data on one page.

In your search query:

x controls the number of datasets per page
p controls the page number that is returned

Example: https://files.jgi.doe.gov/search/?q=e+coli&x=40&p=2

This will return the 3rd page with 40 elements per page - ie, items 41 through 80.

Organisms Array

The organisms array element of the JSON payload represents the datasets and files that have been returned.

The organisms[x].agg and organisms[x].agg_id keys indicate how the dataset is grouped and by what value.

The organisms[x].id key is a concatenation of these 2 values.

Organism Top Hit

The organisms[x].top_hit section provides a summary of the files in that dataset/organism. This section is used to populate the dataset row on the browser interface. For most API users, this section can be ignored.

Values for keys such as proposal, proposal PI, GOLD, NCBI taxon, and FD Project Name should be consistent across all files in the dataset/organism.

The organisms[x].top_hit.file_name and organisms[x].top_hit._id in this section can be ignored.

Organism Files

organisms[x].files[y] item is where you can find the list of files for a particular organism.

organisms[x].files[y]._id item is where you can find the ID that will be submitted to the data restoration or data download endpoints.

organisms[x].files[y].metadata section is where you can find metadata for each file.

organisms[x].files[y].file_status indicates whether the file is on tape (PURGED) or disk (RESTORED).

Facets

Facets are keys in the JSON payload that are used as filters.

Facets can be found near the end of the JSON payload.

The values shown are the values by which you can filter your initial query.

Retrieve Your Session Token

In order to restore files or download files via the API, you will need to provide your session token. You can get this by clicking on Copy My Session Token in the avatar dropdown menu of the browser application (after you login).

Restore Files via API

Description

The request_archived_files POST endpoint will request that your files of interest be restored to disk (from tape).

https://files.jgi.doe.gov/request_archived_files/

Can the file be downloaded immediately?

The files in your JSON payload will have a file_status with one of the following values:

RESTORED
- This means that the file is currently available for download.
- Skip to the Download Files via API section below
PURGED
- This means that the file needs to be restored from our archive (tape system) to disk before it can be downloaded.

NOTE: You can play it safe and always send a request to restore files before downloading your files. JGI will not restore files if they are already available for download.

Restore Files via API

Arguments

"ids":
- - current version: this is an dictionary of dictionaries of "file_id"s, "id"s, "top_hit"s and (when necessary) "mycocosm_portal_id"s collected from organisms[x].files[y]._id
"send_mail"
- - true/false
- do you want to be notified by email when the files are ready to be downloaded - ie, after they have been restored to disk?
"api_version"
- - required...must be set to 2.

Character Limit

- You cannot submit more than 4094 characters to our back-end endpoint.
- If you need to submit a payload greater than 4094 characters, submit a file to the "-d" argument in the curl command (examples)

Return Values

The endpoint will return request_status_url which is a URL
- Example: https://files.jgi.doe.gov/request_archived_files/requests/473580

Restore: Payload Examples

Valid Restore Request Payload (One File) - March 2025

{

"ids": {

"Mycocosm_AP-1184792": {

"file_ids": ["51d4c073067c014cd6ea7469", "51d4c1cf067c014cd6ea859e"],

"top_hit": "59cad2a27ded5e2f1869132c",

"mycocosm_portal_id": "Aspni7"

}

"send_mail": true,

"api_version": "2"

}

Valid Restore Request Payload (Multiple Files) - March 2025

{

"ids": {

"Mycocosm_AP-1184792": {

"file_ids": ["51d4c073067c014cd6ea7469", "51d4c1cf067c014cd6ea859e"],

"top_hit": "59cad2a27ded5e2f1869132c",

"mycocosm_portal_id": "Aspni7"

"IMG_AP-1146261": {

"file_ids": ["595b9afd7ded5e5270eef127"],

"top_hit": "595a83767ded5e5270eeebc8"

"Phytozome-167": {

"file_ids": ["6643ff0653447aa389b9c859", "6643ff0753447aa389b9c865"],

"top_hit": "67b5004267ef7b237e865486"

}

"send_mail": true,

"api_version": "2"

}

Old Version: Valid Restore Request Payload (One File)

This version is old, but still valid.

{

"ids": [

"5503e95a0d878525404e38d7"

"send_mail": true,

"api_version": "2"

}

Old Version: Valid Restore Request Payload (Multiple Files)

This version is old, but still valid.

{

"ids": [

"5503e95a0d878525404e38d7", "550044bb00d878525404e3a3f"

"send_mail": true,

"api_version": "2"

}

Example of valid curl command

curl -X POST "https://files.jgi.doe.gov/request_archived_files/" -H "accept: application/json" -H "Authorization: {paste copied session token here}" -H "Content-Type: application/json" -d "{ \"ids\": [ \"5503e95a0d878525404e38d7\" ], \"send_mail\": true, \"api_version\": \"2\"}"

Check File Restoration Status

Are my files ready?

You can check the status of your files by visiting (in your browser or via CURL) the link that was returned to you when you submitted your file restoration request.

https://files.jgi.doe.gov/request_archived_files/requests/#######

Example:

Non-Globus File Restoration Request: https://files.jgi.doe.gov/request_archived_files/requests/473580

File Restoration Status for Globus Download: https://files.jgi.doe.gov/request_archived_files/requests/488421

- This payload will return information about the restoration request
  - status:
    - NEW
      - The request has been collected.
    - STAGING
      - For Globus downloads...the data is being moved to the Globus download endpoint.
    - PENDING
      - The file restoration request has been made. Some files may be ready, while others have not been transferred to disk.
    - READY
      - All files are available on disk and are available to be downloaded.
    - EXPIRED
      - Some or all of the files have been purged from disk. You will need to submit another file restoration request.
  - expiration_date
    - The date that files will be purged from disk (unless one or many files are requested again).
  - file_ids
    - The IDs of the files in your request.
  - globus_download_url
    - The URL that can be used to start downloading files through Globus.
    - This field will be provided regardless of the status of the request.
If you set send_mail = true, you will be notified via email when your files have all been restored.
- You will be given a link to download your files through the browser.
- This link will direct you to a page that provides a bit more information than the request_status_url that was returned when the restoration request was made.

Example: https://data.jgi.doe.gov/search?restoreid=473580

Download Files via the API

NOTE: The download endpoint exists on a different host than the search and restore endpoints.

Download host: files-download.jgi.doe.gov

Description

After files have been restored (from tape to disk), the download_files POST endpoint will download files in the list you provide.

https://files-download.jgi.doe.gov/download_files/

Download Files via API

Arguments:

"ids"
- This is an dictionary of organism ID and list of 24 character file ids collected from organisms[x].files[y]._id
- Example format:

"ids" : {
data_portal_organism_id_1 : [24_char_file_id_1-1, 24_char_file_id_1-2, ... , 24_char_file_id_1-n],
data_portal_organism_id_2 : [24_char_file_id_2-1, 24_char_file_id_2-2, ... , 24_char_file_id_2-n]
}

- Example format (with fields from search endpoint payload):

"ids" : {
organisms[0].id : [organisms[0].files[0]._id, organisms[0].files[1]._id],
organisms[1].id : [organisms[1].files[0]._id, organisms[1].files[1]._id]
}

"api_version"
- required...must be set to 2.

Character Limit:

- You cannot submit more than 4094 characters to our back-end endpoint.
- If you need to submit a payload greater than 4094 characters, submit a file to the "-d" argument in the curl command (examples)

Return Values:

Output zip file

Download: Payload Examples

Valid Restore Request Payload (One File) - March 2025

{

"ids": {

"Mycocosm_AP-1184792": {

"file_ids": ["51d4c073067c014cd6ea7469", "51d4c1cf067c014cd6ea859e"],

"top_hit": "59cad2a27ded5e2f1869132c",

"mycocosm_portal_id": "Aspni7"

}

"send_mail": false,

"api_version": "2"

}

Valid Restore Request Payload (Multiple Files) - March 2025

{

"ids": {

"Mycocosm_AP-1184792": {

"file_ids": ["51d4c073067c014cd6ea7469", "51d4c1cf067c014cd6ea859e"],

"top_hit": "59cad2a27ded5e2f1869132c",

"mycocosm_portal_id": "Aspni7"

"IMG_AP-1146261": {

"file_ids": ["595b9afd7ded5e5270eef127"],

"top_hit": "595a83767ded5e5270eeebc8"

"Phytozome-167": {

"file_ids": ["6643ff0653447aa389b9c859", "6643ff0753447aa389b9c865"],

"top_hit": "67b5004267ef7b237e865486"

}

"send_mail": false,

"api_version": "2"

}

Old Version: Valid Download Payload (One File)

This version is old, but still valid.

{

"ids": {

"IMG_SP-1060116":["5503e95a0d878525404e38d7"]

"api_version": "2"

}

Old Version: Valid Download Payload (Multiple Files, One Dataset)

This version is old, but still valid.

{

"ids": {

"IMG_SP-1060116":["5503e95a0d878525404e38d7","5f8b782e47675a20c850eda3"]

"api_version": "2"

}

Old Version: Valid Download Payload (Multiple Files, Multiple Datasets)

This version is old, but still valid.

{

"ids": {

"IMG_SP-1060116":["5503e95a0d878525404e38d7","5f8b782e47675a20c850eda3"],

"IMG_SP-1060115":["55044bb00d878525404e3a3f"]

"api_version": "2"

}

Example of valid curl command

curl -X POST "https://files-download.jgi.doe.gov/download_files/" -H "accept: application/json" -H "Authorization: {paste copied session token here}" -H "Content-Type: application/json" -d "{ \"ids\": { \"IMG_SP-1060116\":[ \"5503e95a0d878525404e38d7\", \"5f8b782e47675a20c850eda3\"], \"IMG_SP-1060115\":[\"55044bb00d878525404e3a3f\"]}, \"api_version\": \"2\"}" --output {enter your zip filename here}.zip

Example: Copy an API Call from the Browser

Scenario: You want to get an API call to find Fasta files for e coli:

Step 1: Enter e coli into the search bar on the JDP homepage

Step 2: Select the Fasta file type

Step 3: Press the API icon to copy the search query

Step 4: Paste the API call into your browser or terminal to view the JSON payload.

You should see the following: https://files.jgi.doe.gov/search/?q=e+coli&ff=%7B%22file_type%22:[%22fasta%22]%7D

Step 5: Parse the JSON payload to collect the file IDs you want to download.

Step 6: Construct your file restoration and download commands (we recommend using the Interactive Environment) and run the command in your terminal.

Advanced Queries: Search for values in a specific field

As of July 14, 2023, you can use the API to search for values within certain specific fields.

Use the f parameter (for "fields") to indicate a specific field you would like to search in conjunction with q (for "query").
- - Allowed values for f are:
    - srr
    - biosample
    - project_id
      - this is really more of a jgi_entity_id search parameter
      - this will search pre-2012 JGI project IDs (legacy project IDs), current generation JGI project IDs (Final Deliverable, Sequencing Project, Analysis Project, and Proposals)
    - library
      - this will allow you to search by JGI library names
    - img_taxon_oid
      - this will allow you to search by IMG Taxon OIDs

Example: https://files.jgi.doe.gov/search/?q=BWZZN&f=library&a=false&h=false&d=asc&p=1&x=10&api_version=2

Example: https://files.jgi.doe.gov/search/?q=1310325&f=project_id&a=false&h=false&d=asc&p=1&x=10&api_version=2

- Experiment with these parameters in the Interactive Environment.

Page updated

Report abuse