Quick Start Part 5 Workspace gt Notebooks

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by KateN

on 2018-03-12

This feature is currently released in a Beta status to give you a chance to try it while we work on developing it further. Tell us what you like or struggle with on the forum.

From the notebooks tab of your workspace you can launch an interactive analysis environment based on Jupyter (formerly IPython) notebooks, Spark, and Hail. Jupyter notebooks are becoming an increasingly popular way of creating reproducible bioinformatics analysis tasks. They combine familiar and powerful programming languages, like R and Python, with the ability to create and share documents containing code, results, and narrative text.

Jupyter integrates well with Spark to provide large-scale data processing and analysis, and provides an excellent environment in which to run leading genomic analysis software such as Hail.

When you create or upload a notebook in the Notebooks tab, a new notebook file is created in your workspace bucket. You then have the option to spin up a Dataproc (managed Spark) cluster in your Google Cloud Platform (GCP) project. This gives you secure access to a Jupyter Notebook server on which you can run your notebook. From within a notebook or Spark job, you can access any FireCloud-managed GCP resource (like buckets) you have access to, without need for authentication. You can also use the Python FireCloud client (FISS) to access FireCloud objects such as workspaces or the entity data model. Changes made to your notebook are automatically persisted back to your FireCloud workspace.

Additional resources:

Updated on 2018-09-12

Report abuse