SGD Help: GO Slim Mapper
The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. A gene product's biology is represented by three ontologies: molecular function, biological process and cellular component. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations. To provide the most detailed information available, gene products are annotated to the most granular GO term(s) possible. For example, if a gene product is localized to the perinuclear space, it will be annotated to that specific term only and not to its parent term nucleus. (In this example the term perinuclear space is a child of nucleus.) Parent-children relationships can be easily viewed using AMIGO Tree View.
However, for many purposes, such as reporting the results of GO annotation of a genome, analyzing the results of microarray expression data, or cDNA collection, it is very useful to have a high level view of the three ontologies. For example, if you wanted to find all the genes in an expression cluster that were localized to the nucleus, it would be useful to be able to map the granular annotations, such as perinuclear space, to general terms, like nucleus. Thus, GO slim was created. GO slim is a high level view of GO: a slice of the broad, high level terms such as DNA replication, transcription, and transport. There are several versions of GO slims created for different genomes and the GO slim terms are updated periodically. To view and/or download other GO slims, go to the GO slim ftp site. The GO slim tool at SGD uses the GO slim terms picked by the SGD curators based on annotation statistics and biological significance.
The GO Slim Mapper at SGD was created to allow you to map the granular annotations of the query set of genes to one or more high level, parent GO Slim terms. This is possible with GO because there are parent:child relationships recorded between granular terms and more general parent (i.e., GO slim) terms.
- Using GO Slim Mapper
- Results Table
- Tips to Interpret Results
- Complete Mapping of GO Slim Terms
Using GO Slim Mapper
- Step 1: Enter the gene names
- Type the names of the genes in the input box or upload a file of gene names. Note that the program requires more time to process a long list (greater than 100 genes) than a short list. If your list contains several hundred or more genes and the tool does not appear to be processing the list, please contact SGD curators at firstname.lastname@example.org.
- Step 2: Choose the GO Slim set
- This tool is designed to search only one of the GO Slim sets at a time in order to minimize the search time. When you choose a GO Slim Set, the terms from that set will be listed in the box under Step 3. Six GO Slim Sets are available at SGD:
- Macromolecular complex terms: Component
- A set of granular protein complex terms from the cellular component ontology, useful for determining whether your protein of interest is a member of a particular complex. This set is a list of all protein complex terms and not truly a Slim set.
- Yeast GO-Slim:Component
- A set of high level GO terms that best represent the major biological components that are found in S. cerevisiae. These terms have been selected by SGD curators based on annotation statistics and biological significance.
- Yeast GO-Slim: Function
- A set of high level GO terms that best represent the major biological functions found in S. cerevisiae. These terms have been selected by SGD curators based on annotation statistics and biological significance.
- Yeast GO-Slim: Process
- A set of high level GO terms that best represent the major biological processes that are found in S. cerevisiae. These terms have been selected by SGD curators based on annotation statistics and biological significance.
- generic GO-Slim: Component
- A small set of very broad, high level GO Cellular Component terms that are not species specific; useful for binning groups of genes in general categories.
- generic GO-Slim: Process
- A small set of very broad, high level GO Biological Process terms that are not species specific, useful for binning groups of genes in general categories.
- Step 3: Choose your GO Slim terms
- Select at least one GO slim term from the list; you can also select all the terms. To access information about a particular GO Term and its definition, type the GO Term in the Search box at the top of the page. If you click the Search button after Step 3, the tool will map annotations made to your input list of genes by compiling data from both the Manually curated and High-throughput sets. You can go to Step 4 to filter by Annotation Method. GO Slim Mapper at SGD queries Manually curated and High-throughput annotations only and does not query annotations obtained using computational methods.
- Optional Step 4: Select Annotation Method(s)
- Select either the Manually curated or the High-throughput annotation set.
The results page shows the number of genes from your input list that were mapped, the reasons why some genes may not have been mapped, and a table that displays the mapping results.
The first column in the table (see Example Mapping below) lists the GO slim terms that were chosen in Step 3. The second column lists the frequency with which each GO slim term is associated (directly, or indirectly, via a parental relationship with a granular term) with the genes in your list. For e.g. if there are 10 genes in your input list and 5 of those map to 'phosphatase activity', then the cluster frequency will be 5/10. The third column lists the total number of genes that were mapped to that GO term in the entire background (for e.g. 97 genes out of a total of 6311 in the yeast genome are annotated to phosphatase activity). You can use the YeastMine template Feature Type -> Features of a selected feature type to get a list of the background genes (i.e. 6311). From this template select the operator ‘ONE OF’ and then using the command button select the following set of feature types:
ORF, rRNA, snoRNA, snRNA, ncRNA, tRNA, transposable element gene. (Note: In this case ORFs include only Verified and Uncharacterized. The go_slim_mapping.tab file on the Downloads site includes Dubious ORFs)
Each gene name is hyperlinked to its locus page, which shows all GO annotations associated with that gene.
You can also download the results into a tab-delimited file by clicking on the Download Results link.
Tips to Interpret Results
If some or all of the genes in your input list are not mapped to a GO Slim term, consider these possible reasons.
- Genes in your input list may have been binned into a category called 'cannot be mapped to a GO Slim term'. This refers to genes annotated with a
- GO term that does not map to an existing GO slim term. For example, CDC53 is annotated to the Cellular Component term SCF ubiquitin ligase complex, a term that does not map up to an existing GO slim term. Hence it gets binned into this category.
- Genes in your input list were filtered out by the Annotation filter option in Step 4. For example, if you are trying to map the GO annotations of only the Annotation Method: Manually curated, and your input list has genes with annotations of both types (Manually curated and High-throughput), then filtering out Annotation Set 'High-throughput' in Step 4 will remove those genes that have this type of annotation. In this case you will see a message like the one below, on the top of the results page: The following gene(s) URA7, URA3 cannot be mapped because they have no annotations for the specified GO Slim terms in the selected annotation set Manually curated.
Using the GO Slim Mapper to map these annotations to their GO slim function terms results in the following:
From the two tables above, the following conclusions can be drawn:
- Cdc53p, Pho2p, and Pho4p are all DNA binding proteins.
- Pho2p and Pho4p function as transcription factors, while Cdc53p does not.
- Pho3p is unrelated in function to the other 3 proteins.
Complete Mapping of GO Slim Terms
Some GO slim terms are children of other GO-slim terms. For example, "mRNA binding" (GO:0003729) is a child of "RNA binding" (GO:0003723) but both are GO slim process terms. The GO Slim Mapper uses a 'complete mapping approach' and maps features to all GO slim terms to which they apply. This means, for example, that HEK2, which is annotated to "mRNA binding" will be mapped to both "mRNA binding" and "RNA binding" by the GO Slim Mapper.
Go to GO Slim Mapper