SGD Help: Gene Ontology (GO)
The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations.The Gene Ontology (GO) project started as a collaboration between three model organism databases, the Saccharomyces Genome Database (SGD), FlyBase (for Drosophila), and Mouse Genome Informatics (MGI). The GO Consortium has expanded considerably to include many additional model organism databases and annotation groups, each of which contributes to the development of the ontologies, generation of GO annotation files, or development of software tools to utilize GO depending on the nature of its affiliation.
Within SGD, GO annotations are used to describe what gene products do and where they are located. Thus GO annotations appear directly on the Locus Summary pages for both protein coding and non-coding RNA genes. More detail about the GO annotations or the GO terms are located on additional pages. GO tools such as the GO Term Finder and the GO Slim Mapper utilize the GO annotations to analyze sets of genes and identify common functions, processes, or locations.
- GO Annotation
- Basic parts of a GO Annotation
- Optional additional parts of a GO Annotation
- Annotation Methods
- Manually curated
- Searching and Accessing GO Annotations in SGD
- GO Slim Mapping Tool
- GO Term Finder Tool
- Microarray Data and GO
The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process, and cellular component of gene products. The name and definition for each GO term and the parent-child relationships between terms are defined by the members of the GO Consortium. This combination of a controlled vocabulary of defined terms with a structure of relationships between items is referred to as an ontology. See the GO Consortium's An Introduction to the Gene Ontology for a basic introduction to the Gene Ontologies.
This diagram shows a small portion of the Biological Process ontology. Terms at the top represent broader, more general concepts, while terms lower down represent more specific concepts. When referring to the structure between terms, a term that has terms below it is referred to as a parent term, while those terms below it are referred to as child terms. Note that each term will be a parent with respect to the terms below it and a child with respect to terms above it. There are two different relationship types between terms: is_a and part_of. Note that the Gene Ontologies themselves contain only information about terms in the ontology and their relationships to other terms. They do not contain gene products of any specific organism.
Basic parts of a GO Annotation
To provide specific information about gene products, a GO term, e.g. actomyosin contractile ring contraction, is associated with a gene or gene product, e.g. ACT1 or Act1p to form a GO Annotation. In addition to the association between a gene product and a GO term, a GO annotation must also be associated with a specific reference, an evidence code, and the date on which the annotation was made. Thus a basic GO annotation includes these pieces of information:
- gene (or gene product)
- GO term
e.g., actomyosin contractile ring contraction
The reference contains data or statements which support the annotation, or a description of the method by which the annotation was assigned.
- Evidence Code
The Evidence Code gives a basic indication of the type of data or statement that supports the annotation. More information about the GO Evidence Codes is found in the Guide to GO Evidence Codes.
The date on which the annotation was assigned or reviewed.
These basic, essential parts of a GO annotation are all displayed on SGD's GO Evidence and References pages; see, for example, this one for the ACT1 GO Annotations.
Optional additional parts of a GO Annotation
In addition to the basic, essential components of a GO annotation, there are some optional pieces of information that may be associated with the GO annotation when appropriate. These include a Qualifier, the With/From field and the Annotation Extension field:
There are several Qualifiers that modify the interpretation of an annotation. The three allowed qualifiers are currently NOT, contributes_to, and colocalizes_with. For a detailed explanation of the qualifiers, please see the Using the Qualifier column section of the GO Annotation Conventions guidelines.
- With/From Field
For some evidence codes, it is useful to specify a second object that the gene being annotated interacted with or was compared to. For example, for Inferred from Genetic Interaction (IGI), it is useful to specify which other genes were involved in a genetic interaction with the gene being annotated. Similarly for Inferred from Physical Interaction (IPI), the "with field" specifies the gene products with which the gene product being annotated interacted. When used for Inferred from Sequence or Structural Similarity (ISS), the "with field" indicates what the gene being annotated was compared to in a sequence-based analysis. For the evidence code Inferred by Curator (IC), this column contains the GOID of the GO term(s) used as the basis of the curator's inference.
- Annotation Extension
The Annotation Extension field adds more specificity to the GO term. This field is used to capture substrates of a function term, e.g. protein kinase or targets of transcription factors or the cell type in which a gene product has a particular localization. An annotation extension has two parts: a relation that connects the ‘primary’ GO term to the entity represented by the identifier and an entity identifier for the object that is used to increase the specificity of the annotation (e.g. identifiers for a gene, gene product, GO term or a term from an external ontology such as a cell type or anatomy ontology). More information can be found on the Annotation Extension Guide.
To differentiate annotations made from published small scale experiments, genome-wide or high-throughput experiments and computational predictions, we have separated GO annotations at SGD into three sets:
- Manually curated GO annotations
Manually curated GO annotations reflect our best understanding of the basic molecular function, biological process, and cellular component for a gene product. Manually curated annotations are assigned by SGD curators reading the literature for each gene and making annotations from published papers when available. When published literature is available, such annotations may include those based on experiments, sequence similarity, or other computational analyses described in the paper, or on statements made by the authors. Curators periodically review all Manually curated GO annotations for accuracy and completeness and update as necessary, adding new annotations to reflect advances in knowledge and removing any annotations that are no longer supported by the literature. The Last Reviewed on: date on the GO evidence and references page for a gene indicates the date when an SGD curator reviewed all of the Manually curated GO annotations for that gene. In addition, SGD also reviews and incorporates manual GO annotations for S. cerevisiae proteins from the GO Annotation (GOA) project at Uniprot. These annotations can be identified at SGD by the source, e.g., 'Uniprot', 'MGI', 'HGNC' (GO consortium members), displayed on the 'Assigned By' column of the GO evidence and references page.
- High-throughput GO Annotations
GO annotations from high-throughput experiments are assigned based on a variety of large scale high-throughput experiments, including genome-wide experiments. Many of these annotations are made based on GO annotations (or mappings to GO annotations) assigned by the authors, rather than SGD curators. While SGD curators read these publications and often work closely with authors to incorporate the information, each individual annotation is not necessarily reviewed by a curator. GO Annotations from high-throughput experiments will be assigned only when this type of data is available, and thus may not be assigned in all three aspects of the Gene Ontologies.
- Computational GO Annotations
Computational GO annotations are made by a variety of computational methods, such as sequence similarity methods, including protein domain motifs, and keyword mapping files. When annotations based on computational methods are NOT reviewed by a curator, they are placed in the Computational GO annotations section. Note that when annotations supported by a computational method, such as sequence analysis, are reviewed by a curator, they may be found in the Manually curated section.
At SGD, curators read the research literature and associate specific GO terms with the appropriate gene products to provide information about the state of knowledge of the yeast genome. We are constantly updating our GO annotations and always welcome suggestions for improvement or corrections when the understanding about a gene has changed since the last time we reviewed the literature for a given gene.
Searching and Accessing GO Annotations in SGD
Users can search for GO terms in any of the three Gene Ontologies that match a text query, e.g. "bud", using the Search box located at the top of SGD pages. The search result is a list of matches for the query term. Clicking on the "Gene product activities (GO Molecular Function)", "Cellular roles or processes (GO Biological Process)", or "Protein complexes and locations (GO Cellular Component)" links from the results page will provide lists of GO terms containing the query string.
Users can search for GO terms whose GOIDs (minus the "GO:" prefix and leading zeroes) match a purely numerical query, e.g. "5685", using the Search box located at the top of SGD pages. The search result will usually be the GO term whose GOID matches the query. Occasionally, the search result will be a list of matches for the query term, where clicking on the Gene Ontology ID link will take you to the associated GO Term page.
Accessing GO annotations for specific genes or GO terms
At SGD, you can find GO annotations displayed at various levels of detail in three locations as described below.
- Locus Summary page
Each Locus Summary page, like this one for RCL1, lists the GO terms, with associated evidence codes, that SGD curators have used to annotate the gene of interest. From the Locus Summary page, clicking on the GO evidence and references link takes you to the GO Annotations page for that gene, while clicking on a GO Term name will take you to the corresponding GO Term page.
- GO Annotations page
This page, for example the RCL1 GO Annotations page, lists all the GO terms that have been used to annotate the particular gene, along with the specific reference(s) used to make each annotation and the evidence code(s) describing the type of evidence or statement found in that reference. Annotations are separated by the three different annotation methods: Manually curated GO Annotations, high-throughput and computational. Within each section, annotations from each of the three aspects of the Gene Ontology, Molecular Function, Biological Process, and Cellular Component, are found in individual sections. In addition, the network diagram allows you to visualize connections between different genes, in the form of GO biological process terms they share. The diagram displays GO Biological Process terms (green square) that are shared between the given gene (yellow circle) and other genes (grey circles) based on the number of Process terms shared (slider at the bottom can be used to vary this number). This diagram is also interactive, i.e. the gene names and GO term are clickable from within the diagram and lead you to the respective Locus Summary Page or GO term page.
- GO Term page
Clicking on a term name, from either of the pages described above, takes you to the GO Term page for that term, for example this one for rRNA processing. The GO Term page provides specific information about the GO term, listing any synonyms or alternative phrases for the term name, the definition for the term, the aspect of the gene ontology (biological process, molecular function, or cellular component) to which it belongs along with its GOID number (a unique numerical identifier), and a graphical view showing the relationship between this term and others in the ontology. Annotations of genes within SGD are summarized in a table, along with the relevant reference and evidence code for each annotation. More details about the GO term at AmiGO can be found by clicking on the GOID link, while annotations to this term in other species can be accessed by clicking on the 'View GO Annotations in other species in AmiGO' link.
- Downloads Site
A file containing GO annotations, gene_association.sgd, is available for download.
GO Slim Mapping Tool at SGD
This tool identifies the major branches of the ontologies common to a list of genes or ORFs, based on their GO annotations. The GO terms that represent the major branches of the ontology are higher level terms, also known as the GO slim terms. This is possible with GO because there are parent-child relationships recorded between the granular terms and the high level GO slim terms. For more information on this tool, please click here.
GO Term Finder Tool at SGD
This tool searches for significant shared GO terms or parents of GO terms used to describe your set of genes or ORFs. This tool helps you understand what is common among the genes/ORFs you are studying. Results from this search are displayed in a graphic and table form. The graphic view shows the parent-child relationships (DAG view) of the GO terms that are used to annotate the genes/ORFs. For more information on this tool, please click here.
Microarray Data and GO at SGD
SPELL (Serial Pattern of Expression Levels Locator) is an analysis tool for microarray data that facilitates the rapid identification of the most informative datasets and co-expressed genes based on patterns of expression shared with a query gene or genes. Search results also display GO term enrichment for the genes of interest and other genes that have similar expression patterns. This helps identify relationships between a large number of genes with similar expression profiles.