BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching. For searches using a query sequence of fewer than 20 residues, PatMatch is the best choice.
More information about BLAST searching can be found in the NCBI BLAST Help Manual.
BLAST searches offered by SGD allow users to compare any query sequence to S. cerevisiae sequence datasets. To search other (non-yeast) datasets, NCBI BLAST can be used. To search fungal sequences, use SGD's Fungal BLAST tool.
The query page has several options as described below.
Step 1: Enter the query sequence
Sequences can be submitted for a BLAST search in two different ways. The sequence can be uploaded from a local text file with FASTA, GCG, or RAW formatting, or the sequence can be typed or pasted into the Query Sequence window. (Note: The contents of an uploaded sequence file will not be displayed in the Query Sequence window of the search page.) To use the Upload Local File option:
Step 2: Choose the appropriate BLAST program
SGD offers five BLAST programs to accommodate different types of searches:
Step 3: Choose one or more Sequence Datasets
SGD offers a selection of sequence databases that can be searched, including sequences from a large variety of Saccharomyces cerevisiae strains.
Step 4: Run BLAST
Note that you may want to change the default options, which are discussed below (see Using the BLAST Options to Refine Your Results) and in the BLAST documentation at NCBI.
BLAST search results are shown in the user's web browser.
The results of a BLAST query are reported in roughly the same format, regardless of the program selected. The first section is a graphical overview of the results, the second is a series of one-line descriptions of matching database sequences, the third is a set of the actual alignments of the query sequence with database sequences, and the last section lists the parameters used and the statistics generated during the search.
The graphical display and one-line descriptions give information about database sequences that form a High-Scoring Segment Pair (HSP) with the query sequence. An HSP is created when two sequence fragments (one from the query sequence and the other from a database sequence) show a locally maximal alignment for which the alignment exceeds a pre-defined cutoff score. BLAST uses HSPs to identify hits.
The above is a reduced example of the BLAST graphical overview format. Significant features include color coding of P-values, a wide selection of hits, use of JavaScript to display annotations, and a date stamp for archival reference.
Each hit may contain one or more high-scoring segment pairs (HSPs). Each HSP is drawn as a line, and is aligned with the query sequence. This figure shows two short HSPs and three long ones running off the right edge. The smallest HSP begins at 185 bp and ends at 233 bp along the query sequence.
In the full text BLAST results, each HSP is either either plus or minus. If the query and HSP strands are the same, the HSP is termed forward. If they differ, the HSP is termed reverse.
All HSPs for a displayed hit are drawn. They share a single background color to signify their relationship. Here are two hits, each containing multiple HSPs. For the first hit, YAL029C, the background is white. For the second, YHR023W, the background is gray.
The hits are color coded according to their P value. A set of five fixed ranges is used to determine a color for each hit. These ranges, from "worst" to "best," are:
The key shows these colors, and notes the value of the negative exponents in each range. It progresses from "worst" on the left to "best" on the right. Note that ranges might not contain any hits, since the ranges are fixed while the hit P-values are not. When ranges share a boundary value (e.g.: 1e-50), that value falls in the "better" range and will be colored thus (e.g.: green).
Often, there will be more data available than can be displayed in the graphic. The current system takes a particular approach to selecting data to include, biased in favor of giving a complete overview of the data rather than showing only the top hits. The rationale is that it can be important to show results further away from identity.
First, the hits are sorted into color coded ranges. Next, the top hit from each range is picked, starting with the "best." It keeps track of how much space each hit will take up when drawn; if, after including those, there is still room left over, it iterates once more, picking the next top hit from each range. This process continues until there are either no more hits, or there is no room left in the display.
Note that the final drawing of the hits will be in proper order, even though hits have been selected in an interleaved fashion: all of the best hits are drawn at the top of the image.
If not all hits are shown, range counts will appear at the right side of the graph. In our example, all hits from the top range are shown and thus the annotation says "All." However, not all hits in the next range were able to be displayed so "1/3" indicates two omitted hits.
Note that if a range contains no hits, no count is shown (thus, there are no green or cyan notations in our example). If all of the BLAST results fit into the graph, no range counts are displayed at all.
Hit names and P-values are displayed at the left side of the graph.
If you enable JavaScript in your web browser, annotations for each hit will be displayed in a text field just above the graph as you move the mouse; the score is included along with the P-value. For example:
p=0.0e0 s=7741 YOR326W|MYO2, Chr XV from 925712-930436
The one-line descriptions summarize information about the database sequences that form HSPs with the query sequence. At the left end of each one-line description is the name of the database sequence that forms an HSP with the query sequence. Each description also includes the score and P-value for the hit.
The sequence alignments show the query sequence at the top, with the aligned database sequence (Sbjct, or subject) at the bottom. The starting and ending coordinates of the areas of similarity are shown at the left and right of the aligned sequences. When nucleotide sequences are being aligned, vertical lines between the bases signify identities. Amino acid identities are shown by the repetition of the one-letter code for that amino acid between the residues. Conservative amino acid changes are shown by a "+" sign between the aligned residues. Places where gaps had to be introduced to achieve the alignment are signified by a "-" in the query or subject sequences.
For amino acid sequences, the default filter setting is "seg." This filter removes repetitive sequences. Removed residues are indicated by Xs. For nucleic acid sequences, the default filter setting is "dust." The removed residues are represented as Ns. To turn off this filter, return to the BLAST search page and select "Off" as a filter option.
If the BLAST search results don't look optimal, you can experiment with several of the parameters, as follows:
Go to BLAST