Genome Browser

SGD's instance of JBrowse provides a lightweight and fast web visualization environment for published genomic datasets. Genome wide data in SGD's instance of JBrowse is generally represented in two ways: feature tracks which map biological data to specific chromosomal regions, or coverage tracks which quantify biological data across every chromosomal position.

Contents

Navigating JBrowse

JBrowse is initially populated with an annotation track representing all sequence features and a selection of genomic data tracks, or a previously cached user view. The navigation bar at the contains a red box highlighting the portion of the S288C reference genome annotation and genomic track data is displayed for. You can select a region from the entire chromosome by dragging a desired selection on the same bar as the red box or moving the box itself. Alternatively, you can specify a desired region or feature by inputting genomic coordinates or feature name in the text query box, as well as specify a different chromosome to view in the query or neighboring drop down menu. Adjacent to the chromosome selection drop down are buttons which allow you to pan your viewing region left/right or zoom out/in to varying degrees. You can also pan by scrolling your mouse horizontally and zoom in by double clicking a point on the track display window. Lastly, clicking and dragging your mouse to highlight a region within your track display window by  will zoom into it.

Uploading Data

To visualize your own experimental data, select "Open track file or URL" under the Track option in the top left menu. The open files pop up allows you to open local track files or those remotely hosted by URL. SGD's current instance of JBrowse accepts GFF3, GTF, BigWig, BAM, FASTA, VCF, and Tabix as input files. For each track file or URL uploaded, the New Tracks box presents a 'Display' drop down for you to select a viewing style for each track, as well as an 'Edit Configuration' button to customize the display further using JSON format. More instruction on how to configure tracks, and even set up your own browser instance is available through the official JBrowse documentation.

Use case: 2-micron plasmid

To visualize the 2-micron plasmid in SGD's JBrowse, you must first upload its sequence and annotations, which can be found in two files from SGD: the sequence of the plasmid as a FASTA file, and the annotations as a GFF.

The sequence of the plasmid can be downloaded from the 2-micron page.  The gene annotations in GFF format can be downloaded from the SGD Downloads site.

For the uploaded files to display correctly in JBrowse, the chromosome designation in the FASTA and the GFF must match.  If you download the two files linked above, you must edit the FASTA file to change the first line to read: ">2-micron".  Conversely, you could change the GFF to match the FASTA, but changing the FASTA involves only a single edit instead of multiple.

Using the ‘Genome’ pulldown at top left, use 'Open Sequence File' to upload the FASTA via drag-and-drop and select 'FASTA' from the file options.  Then upload the GFF file using 'Track' pulldown.  Be sure to make sure that both the FASTA and GFF are selected in the left side.

Track Selector and Metadata

Clicking the "Select tracks" button in the top left allows you to selectively display genomic datasets that SGD have hosted for visualization in JBrowse. At the top under "My Tracks" are facets to list tracks currently or recently displayed in your viewing window. You can filter particular tracks of interest, by using the facet for category of data, type of assay used, strain background, first author, principal investigator, PubMed ID, or year of publication. You can select multiple facets and redo your selection by clicking the X next to a facet header. The metadata table lists details for all tracks matching the selected facets, and clicking the checkbox next to the PMID column adds the track to your current JBrowse display. It is also possible to use the search box above the table to filter for any tracks with text matches in their metadata.

Use Case: S288C Transcriptome

SGD's S288C transcriptome dataset provides an examples of both discrete feature and continuous coverage data. To generate JBrowse track representations, SGD revisited the transcript isoform sequencing (TIF-seq) Pelechano et al (PMID 23615609) dataset that simultaneously profiled 5' capped mRNA transcription start site and the 3' polyadenylation site of transcripts. Separate profiling experiments were conducted for WT cells grown in glucose (ypd) media and galactose (gal) media. Original text files containing raw abundance counts and transcript isoform chromosomal locations were downloaded from the Gene Expression Omnibus (GEO accession GSE39128). 

Transcript isoforms that fully overlapped the known gene coding sequence in the S288C reference genome were identified and parsed into the following feature files:

2. longest_full-ORF_transcripts_ypd.gff3: This track contains the longest transcript overlapping each individual ORF completely for WT cells grown in glucose (ypd) media.

3. most_abundant_full-ORF_transcripts_ypd.gff3: This track contains the most abundant transcript overlapping each individual ORF completely for WT cells grown in glucose (ypd) media.

4. longest_full-ORF_transcripts_gal.gff3: This track contains the longest transcript overlapping each individual ORF completely for WT cells grown in galactose (gal) media.

5. most_abundant_full-ORF_transcripts_gal.gff3: This track contains the most abundant transcript overlapping each individual ORF completely for WT cells grown in galactose (gal) media.

Full-ORF transcripts were given identifiers according to the ‘unfiltered_full-ORF_transcripts.gff3’ feature file. Each transcript isoform’s ‘ID’ in the gff3 file contains the systematic name of the transcript’s associated ORF and a suffix labeling transcripts by length of the transcript in descending order. If transcripts are of equal length, transcripts further upstream from the associated ORF are labeled with a higher suffix. Clicking on the glyph for a transcript isoform opens a popup giving the exact chromosomal location, the raw abundance in glucose (ypd) and/or galactose (gal) media, and the portion of the reference sequence covered by the isoform (computationally generated by JBrowse). Pink shading of the glyph also reflects abundance.

All transcript isoforms were taken into account, regardless of length or if they fully covered ORFs, to generate the following coverage files:

6. plus_strand_coverage_ypd.bw: For WT cells grown in glucose media (ypd), the amount of transcripts covering each position on the plus strand is represented in this track.

7. minus_strand_coverage_ypd.bw: For WT cells grown in glucose media (ypd), the amount of transcripts covering each position on the minus strand is represented in this track.

8. plus_strand_coverage_gal.bw: For WT cells grown in galactose media (gal), the amount of transcripts covering each position on the plus strand is represented in this track.

9. minus_strand_coverage_gal.bw: For WT cells grown in galactose media (gal), the amount of transcripts covering each position on the minus strand is represented in this track.

The S288C transcriptome coverage tracks reflect the heterogeneity of transcription noted in the original study. However, one caveat is that transcripts above five kilobases were not included in the study, by nature of the experimental procedure. YeastMine contains a compilation transcript table that integrates this transcriptomic data with 11 other publications. We encourage you to explore, compare, and contrast the transcriptional dynamics of your own experiments in SGD's JBrowse instance as well.