SGD Help: Gene Ontology (GO)

The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations. The Gene Ontology (GO) project started as a collaboration between three model organism databases, the Saccharomyces Genome Database (SGD), FlyBase (for Drosophila), and Mouse Genome Informatics (MGI). The GO Consortium has expanded considerably to include many additional model organism databases and annotation groups, each of which contributes to the development of the ontologies, generation of GO annotation files, or development of software tools to utilize GO depending on the nature of its affiliation.

Within SGD, GO annotations are used to describe what gene products do and where they are located. Thus GO annotations appear directly on the Locus Summary pages for both protein coding and non-coding RNA genes. More detail about the GO annotations or the GO terms are located on additional pages. GO tools such as the GO Term Finder and the GO Slim Mapper utilize the GO annotations to analyze sets of genes and identify common functions, processes, or locations.

The latest gene association file (GAF) is available from SGD's Downloads site.

All Gene Ontology data in SGD are from the latest snapshot.

GO Annotation
1. Basic parts of a GO Annotation
2. Optional additional parts of a GO Annotation
3. Annotation Methods
  - Manually curated
  - High-throughput
  - Computational
Searching and Accessing GO Annotations in SGD
GO Slim Mapping Tool
GO Term Finder Tool
Microarray Data and GO
Gene Ontology Causal Activity Models (GO-CAMs)

GO Annotation

The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products. The name and definition for each GO term and the parent-child relationships between terms are defined by the members of the GO Consortium. This combination of a controlled vocabulary of defined terms with a structure of relationships between items is referred to as an ontology. See the GO Consortium's About the GO for a basic introduction to the Gene Ontologies.

This diagram shows a small portion of the Biological Process ontology. Terms at the top represent broader, more general concepts, while terms lower down represent more specific concepts. When referring to the structure between terms, a term that has terms below it is referred to as a parent term, while those terms below it are referred to as child terms. Note that each term will be a parent with respect to the terms below it and a child with respect to terms above it. There are two different relationship types between terms: is_a and part_of. Note that the Gene Ontologies themselves contain only information about terms in the ontology and their relationships to other terms. They do not contain gene products of any specific organism.

Basic parts of a GO Annotation

To provide specific information about gene products, a GO term, e.g. actomyosin contractile ring contraction, is associated with a gene or gene product, e.g. ACT1 or Act1p to form a GO Annotation. In addition to the association between a gene product and a GO term, a GO annotation must also be associated with a specific reference, an evidence code, and the date on which the annotation was made. Thus a basic GO annotation includes these pieces of information:

Gene (or gene product)

e.g., ACT1

GO term

e.g., actomyosin contractile ring contraction

Reference

The reference contains data or statements which support the annotation, or a description of the method by which the annotation was assigned.

Evidence Code

The Evidence Code gives a basic indication of the type of data or statement that supports the annotation. More information about the GO Evidence Codes is found in the Guide to GO Evidence Codes.

Date

The date on which the annotation was assigned or reviewed.

These basic, essential parts of a GO annotation are all displayed on SGD's GO Evidence and References pages; see, for example, this one for the ACT1 GO Annotations.

For a more complete guide to GO practice in the use of GO terms for the annotation of gene products, please see the GO Consortium's Introduction to GO Annotations and Guide to GO Evidence Codes.

Optional additional parts of a GO Annotation

In addition to the basic, essential components of a GO annotation, there are some optional pieces of information that may be associated with the GO annotation when appropriate. These include a Qualifier, the With/From field and the Annotation Extension field:

Qualifiers

There are several Qualifiers that modify the interpretation of an annotation. The three allowed qualifiers are currently NOT, contributes_to, and colocalizes_with. For a detailed explanation of the qualifiers, please see the Annotation Qualifiers section of the Introduction to GO Annotations.

With/From Field

For some evidence codes, it is useful to specify a second object that the gene being annotated interacted with or was compared to. For example, for Inferred from Genetic Interaction (IGI), it is useful to specify which other genes were involved in a genetic interaction with the gene being annotated. Similarly for Inferred from Physical Interaction (IPI), the "with field" specifies the gene products with which the gene product being annotated interacted. When used for Inferred from Sequence or Structural Similarity (ISS), the "with field" indicates what the gene being annotated was compared to in a sequence-based analysis. For the evidence code Inferred by Curator (IC), this column contains the GOID of the GO term(s) used as the basis of the curator's inference.

Annotation Extension

The Annotation Extension field adds more specificity to the GO term. This field is used to capture substrates of a function term, e.g. protein kinase or targets of transcription factors or the cell type in which a gene product has a particular localization. An annotation extension has two parts: a relation that connects the ‘primary’ GO term to the entity represented by the identifier and an entity identifier for the object that is used to increase the specificity of the annotation (e.g. identifiers for a gene, gene product, GO term or a term from an external ontology such as a cell type or anatomy ontology). More information can be found on the Annotation Extensions.

Annotation Methods

To differentiate annotations made from published small scale experiments, genome-wide or high-throughput experiments and computational predictions, we have separated GO annotations at SGD into three sets:

Manually curated GO annotations

Manually curated GO annotations reflect our best understanding of the basic molecular function, biological process, and cellular component for a gene product. Manually curated annotations are assigned by SGD curators reading the literature for each gene and making annotations from published papers when available. When published literature is available, such annotations may include those based on experiments, sequence similarity, or other computational analyses described in the paper, or on statements made by the authors. Curators periodically review all Manually curated GO annotations for accuracy and completeness and update as necessary, adding new annotations to reflect advances in knowledge and removing any annotations that are no longer supported by the literature. The Last Reviewed on: date on the GO evidence and references page for a gene indicates the date when an SGD curator reviewed all of the Manually curated GO annotations for that gene. In addition, SGD also reviews and incorporates manual GO annotations for S. cerevisiae proteins from the GO Annotation (GOA) project at Uniprot. These annotations can be identified at SGD by the source, e.g., 'Uniprot', 'MGI', 'HGNC' (GO consortium members), displayed on the 'Assigned By' column of the GO evidence and references page.

High-throughput GO Annotations

GO annotations from high-throughput experiments are assigned based on a variety of large scale high-throughput experiments, including genome-wide experiments. Many of these annotations are made based on GO annotations (or mappings to GO annotations) assigned by the authors, rather than SGD curators. While SGD curators read these publications and often work closely with authors to incorporate the information, each individual annotation is not necessarily reviewed by a curator. GO Annotations from high-throughput experiments will be assigned only when this type of data is available, and thus may not be assigned in all three aspects of the Gene Ontologies.

Computational GO Annotations

Computational GO annotations are made by a variety of computational methods, such as sequence similarity methods, including protein domain motifs, and keyword mapping files. When annotations based on computational methods are NOT reviewed by a curator, they are placed in the Computational GO annotations section. Note that when annotations supported by a computational method, such as sequence analysis, are reviewed by a curator, they may be found in the Manually curated section.

At SGD, curators read the research literature and associate specific GO terms with the appropriate gene products to provide information about the state of knowledge of the yeast genome. We are constantly updating our GO annotations and always welcome suggestions for improvement or corrections when the understanding about a gene has changed since the last time we reviewed the literature for a given gene.

Searching and Accessing GO Annotations in SGD

Users can search for GO terms in any of the three Gene Ontologies that match a text query, e.g. "bud", using the Search box located at the top of SGD pages. The search result is a list of matches for the query term. Clicking on the "Gene product activities (GO Molecular Function)", "Cellular roles or processes (GO Biological Process)", or "Protein complexes and locations (GO Cellular Component)" links from the results page will provide lists of GO terms containing the query string.

Users can search for GO terms whose GOIDs (minus the "GO:" prefix and leading zeroes) match a purely numerical query, e.g. "5685", using the Search box located at the top of SGD pages. The search result will usually be the GO term whose GOID matches the query. Occasionally, the search result will be a list of matches for the query term, where clicking on the Gene Ontology ID link will take you to the associated GO Term page.

Accessing GO annotations for specific genes or GO terms

At SGD, you can find GO annotations displayed at various levels of detail in three locations as described below.

Locus Summary page

Each Locus Summary page, like this one for RCL1, lists the GO terms, with associated evidence codes, that SGD curators have used to annotate the gene of interest. From the Locus Summary page, clicking on the GO evidence and references link takes you to the GO Annotations page for that gene, while clicking on a GO Term name will take you to the corresponding GO Term page.

GO Annotations page

This page, for example the RCL1 GO Annotations page, lists all the GO terms that have been used to annotate the particular gene, along with the specific reference(s) used to make each annotation and the evidence code(s) describing the type of evidence or statement found in that reference. Annotations are separated by the three different annotation methods: Manually curated GO Annotations, high-throughput and computational. Within each section, annotations from each of the three aspects of the Gene Ontology, Molecular Function, Biological Process, and Cellular Component, are found in individual sections. In addition, the network diagram allows you to visualize connections between different genes, in the form of GO biological process terms they share. The diagram displays GO Biological Process terms (green square) that are shared between the given gene (yellow circle) and other genes (grey circles) based on the number of Process terms shared (slider at the bottom can be used to vary this number). This diagram is also interactive, i.e. the gene names and GO term are clickable from within the diagram and lead you to the respective Locus Summary Page or GO term page.

GO Term page

Clicking on a term name, from either of the pages described above, takes you to the GO Term page for that term, for example this one for rRNA processing. The GO Term page provides specific information about the GO term, listing any synonyms or alternative phrases for the term name, the definition for the term, the aspect of the gene ontology (biological process, molecular function, or cellular component) to which it belongs along with its GOID number (a unique numerical identifier), and a graphical view showing the relationship between this term and others in the ontology. Annotations of genes within SGD are summarized in a table, along with the relevant reference and evidence code for each annotation. More details about the GO term at AmiGO can be found by clicking on the GOID link, while annotations to this term in other species can be accessed by clicking on the 'View GO Annotations in other species in AmiGO' link.

Downloads Site

The latest gene association file (GAF) is available from SGD's Downloads site.

GO Slim Mapping Tool at SGD

This tool identifies the major branches of the ontologies common to a list of genes or ORFs, based on their GO annotations. The GO terms that represent the major branches of the ontology are higher level terms, also known as the GO slim terms. This is possible with GO because there are parent-child relationships recorded between the granular terms and the high level GO slim terms. See the GO Slim Mapper SGD help page for more information.

GO Term Finder Tool at SGD

This tool searches for significant shared GO terms or parents of GO terms used to describe your set of genes or ORFs. This tool helps you understand what is common among the genes/ORFs you are studying. Results from this search are displayed in a graphic and table form. The graphic view shows the parent-child relationships (DAG view) of the GO terms that are used to annotate the genes/ORFs. See the GO Term Finder SGD help page for more information on this tool.

Microarray Data and GO at SGD

SPELL (Serial Pattern of Expression Levels Locator) is an analysis tool for microarray data that facilitates the rapid identification of the most informative datasets and co-expressed genes based on patterns of expression shared with a query gene or genes. Search results also display GO term enrichment for the genes of interest and other genes that have similar expression patterns. This helps identify relationships between a large number of genes with similar expression profiles.

Gene Ontology Causal Activity Models (GO-CAMs)

What are GO-CAMs?

GO-CAMs (Gene Ontology Causal Activity Models) are a framework developed by the Gene Ontology (GO) Consortium to represent complex biological pathways in a structured, computable format. Unlike traditional GO annotations that associate individual gene products with single GO terms, GO-CAMs connect multiple annotations into integrated network models that capture the causal relationships and dependencies between molecular activities in biological processes.

How GO-CAMs differ from standard GO annotations

Standard GO annotations consist of a gene product associated with a GO term from one of three aspects (Molecular Function, Biological Process, or Cellular Component), an evidence code (inferred from direct assay, inferred from mutant phenotype, etc.), and the published reference supporting the information.

GO-CAMs extend standard annotations by:

Linking multiple gene product activities into pathway networks
Representing causal relationships between molecular functions
Capturing the flow of biological processes
Including regulatory relationships and temporal sequences
Providing integrated views of how gene products work together

Benefits of GO-CAMs

GO-CAMs provide several advantages over traditional annotations. By integrating data from multiple publications into unified models, they enhance mechanistic understanding by showing how molecular activities connect to produce biological outcomes. GO-CAMs provide pathway context by placing individual gene functions within broader biological processes, thereby facilitating causal reasoning and enabling computational queries on upstream and downstream relationships. GO-CAMs also support comparative analysis of pathways across species and aid in hypothesis generation by identifying gaps in knowledge which can help suggest experiments.

Researchers can use GO-CAMs to explore gene function, identify relationships, plan experiments, compare across species, generate hypotheses, and analyze high-throughput data.

Explore gene function: Understand how a gene product fits into biological pathways
Identify relationships: Discover functional connections between genes
Plan experiments: Identify gaps in pathway knowledge
Compare across species: Leverage orthology to compare pathway mechanisms
Generate hypotheses: Infer potential roles for uncharacterized genes
Analyze high-throughput data: Provide pathway context for genomics data

All GO-CAM models are manually curated by expert biologists, based on published experimental evidence, supported by literature references with evidence codes, reviewed and updated as new data emerges, traceable to primary research articles.

Structure of GO-CAM pathways

Each GO-CAM pathway provides a structured view of curated biological knowledge connecting gene products to activities using causal relationships while providing contextual information.

Activities are "Molecular Functions"

Activities are core molecular functions performed by gene products and represented using GO Molecular Function terms, which serve as nodes in the pathway network.

Gene Products are "Enablers"

Proteins, complexes, or RNAs that perform, or "enable", the activities are connected to their respective molecular functions.

Causal Relationships

Causal relationships are connections showing how one activity leads to another. These can also include regulatory relationships such as activation or inhibition. Causal relationships capture temporal sequences and dependencies, and are represented using defined relation types from the Relation Ontology.

Contextual Information

Contextual information describes where activities occur (Cellular Component) within the broader biological context (Biological Process), using additional qualifiers describing conditions, substrates, etc.

Viewing GO-CAM pathways in SGD (coming soon!)

How to access GO-CAM pathway pages from gene pages
Available pathway visualizations
How to browse pathway lists
Search capabilities for GO-CAMs
Links to external GO-CAM browsers

Interpreting GO-CAM visualizations

GO-CAM pathways are typically displayed as network diagrams where:

Rounded boxes represent molecular activities (GO Molecular Functions)
Gene product names appear as enablers of activities
Colors or shapes may distinguish different types of relationships
- Arrows indicate causal relationships between activities
Subcellular locations may be indicated by compartments or labels

Contributing to GO-CAMs

Creating a comprehensive functional representation of the eukaryotic cell is a large project. We welcome community input to help improve, expand, or correct the current models. If you have subject area knowledge on a specific pathway, please consider contacting us to provide feedback on accuracy and completeness. The GO-CAM framework is your community resource.

Please contact the SGD curators to contribute if you have:

Published data that could enhance existing models
Expertise in pathways not yet represented
Suggestions for pathway improvements

Additional resources

Gene Ontology Consortium GO-CAM documentation at https://geneontology.org/docs/gocam-overview/
GO-CAM Browser at https://go-cam-browser.geneontology.org

Google Sites

Report abuse