Plant Genetics Visualization

The purpose of the page is to provide a sample of visualization techniques and tools available for the communication, exploration and analysis of of plant data. This work was compiled for the iPlant Project.

Visual Metaphors and Examples

Scatterplots

2-D Scatterplots are bivariate, in that they plot the values of two variables against each other. Scatterplots allow a number of important constructions. In the plot below, the genes in green have been identified as being similar, based on a cluster algorithm. The best-fitting lines through the "green" and "not green" populations allow the analyst to make inferences about the relationships in gene expression between the two groups, which could be useful in developing predictors for identifying other genes that may be in this category.

This 3-dimensional scatterplot shows three variables plotted against each other. The same 6 genes from above are shown. There are two additional genes, marked in red, that show very similar expression levels across these three variables, but were not picked up by the cluster algorithm, and are identified for further study.

Parallel Coordinates

Parallel Coordinates allows the analyst to see the value of many entities along many variables. Color is critical to highlight subsets of interest. This plot shows gene expression levels. An algorithm has identified the genes in green as belonging to the same cluster, based on their similarity in expression across experimental conditions.

Heat Maps

Heat maps show the value of one variable at each spatial location of a 2-D "map". This is commonly used for gene expression data. The value encoded in color can also be a computed parameter, such as the uncertainty (e.g., standard deviation) of each measurement. The value being represented, whether a measurement or a calculation, is depicted by a color, and the range of colors used is called a color map. It is important to use color maps that communicate the correct meaning.

Dendograms

Dendograms graphically depict the similarity between entities, organized as a hierarchical tree. Closely related entities are shown near each other, and each successive level of the tree shows how the pair relationships cluster. This circular arrangement provides an effective use of space, but does not provide any additional information over a linear format.

Heat Map with Dendogram

This map from GeneSpring shows expres sion of many genes across 6 experimental conditions. The genes have been ordered vertically according to their behavior similarity, shown in the dendogram (in green). Notice that this color map provides more levels of differentiation in expression level than the standard green/red color map. Also, the inclusion of the color scale helps the analyst interpret the levels in a more quantitative manner.

Images

This scatterplot of images shows how two techniques can be combined. The images are placed in the 2-d plot according to their values along two dimensions. This allows the analyst to see how different images relate, and provides a workbench for exploring those parameters that best describe their relationship. A scatterplot could also be used to display other entities, such as documents, parametrically.

Pseudo-color Images

This is a two-dimensional microarray showing the amount of hybridization for different cDNA clones probed with cDNA from human bone marrow mRNA. A fluourescent label indicates the degree of hybridization, and this intensity level is mapped onto a pseudocolor map. (Source: Genome 3, reprinted from Nature, courtesy of Tom Strachan).

Networks

Network visualizations show connections, or edges, between nodes. In this case, gray and white nodes are 5S sRNA sequences from plants growing in two different geographical regions. The distance between the nodes shows alignmnet similarity. This network has been pruned to show only the most significant connections. This is called a "directed" graph. From Genome 3.

The network below has been hand-drawn to show organizational features of the major mitochondrial haplogroups in the human population today. Color has been added as a semantic marker to show in which geographic regions these haplogroups are most common. From Genome 3.

Trees

A tree is a network which is hierarchical, such as this phylogenetic tree for plants.

From the Berkeley Tree of Life Project

http://www.rebeccashapley.com/cipres/gallery.htm

Sequences

Comparing two gene sequences for Aribidopsis, using the CoGe Comparative Genomics) software from Berkeley.

om

Comparing different parts of human chromosome 22 to identify regions with nucleotide sequence similarity (e.g., "duplications".

From Genome 3 by T.A. Brown, Chapter 18.

3D Visualization

Spatial Mappings

Color coding

Interactive Workbenches

A workbench is a tool that combines visualization tools into a viewing and analysis platform. The libraries provide functions that can be combined in different ways to support different needs. This is an efficient way to reduce redundancy and enable integration between operations. It also provides the opportunity to include visualization guidance that can benefit a wide range of applications. A toolkit approach also supports extensibility and sharing.

Sky View Workbench for interactive analysis of images.

Weave Workbench for interactively exploring statistical parameters of a model and the values of these parameters in a 3-D representation of a heart. (Gresh, Rogowitz, et al)

Analysis Tasks (gleaned from visualization tool descriptions)

  • View data, one chromosome or one genotype at a time

  • View all chromosomes for all genotypes in a single image

  • Sort genotypes

  • Aligning gene sequences to find patterns

  • Study the time course of gene behavior (e.g., senescence) or gene expression behavior across experimental conditions

  • Map gene expression levels onto a map of functional pathways; study changes in gene expression; compare pathway signatures

  • Visualizing alternative spicing and expression data

  • Visualizing high density complex data (e.g., SNPs, genome sequencing)

  • Filtering, clustering, hierarchical clustering

  • Comparing interaction networks (directed graph) with annotated pathway (e.g, GenMAPP)

Tools

EFP Browser

MapMann

CoGe

WebLogo

http://weblogo.berkeley.edu/logo.cgi

http://weblogo.berkeley.edu/examples.html

An interesting graphical way to look at the formation of sequence logos when looking at multiple sequence alignments. This site allows you to upload sequences, then set the parameters yourself.

GeneSpring

http://www.chem.agilent.com/en-US/products/software/lifesciencesinformatics/genespringgx/Pages/default.aspx

System for the analysis and visualization of gene expression data.

A time-course analysis of leaf senescence in the genes of Arabidopsis thaliana=

GGT 2.0

http://jhered.oxfordjournals.org/cgi/content/full/esm109v1

GGT2.0 Ralph van Berloo, "GGT 2.0: Versatile Software for Visualization and Analysis of Genetic Data" Laboratory of Plant Breeding, Wageningen University, The Netherlands http://jhered.oxfordjournals.org/cgi/content/full/esm109v

GGT was originally created for visualization of molecular marker data. It supports filtering and sorting genotypes, diversity analysis using several similarity parameters, linkage disequilibrium measures with a heat map of LD values, statistical analysis for bi-allelic and multi-allelic markers, and populations subset selection.

GenMapp

http://www.genmapp.org/introduction.html

A free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes. This application has pathways for human, mouse and rat, and other animals, but could be extended to cover plants. This site also includes a powerpoint presesentation that can be downloaded.

BLAST

                  • Maize Genetics Conference Abstracts

                  • March 1, 2009

                  • Near isogenic lines are powerful resources to analyze phenotypic variation and are important in efforts to map-base clone genes underlying mutants and traits. With many thousands of distinct genotypes, querying introgression libraries for lines of interest is an issue. To make it more tractable, we created a tool to graphically display and query such data. This tool incorporates a web interface for displaying the location and extent of introgressions. For comparative purposes, each marker is associated with the genetic position of a reference map. Users can search for introgressions using marker names or chromosome number and map position. This search will result in a display that gives the names of the lines with an introgression at the given position. Upon selecting one of the lines, color-coded introgressions in all 10 chromosomes of the line are displayed graphically. Then, upon selecting a chromosome, the user is taken to a web page that shows all of the markers on the chromosome along with the introgressions.