Angiosperms

Subproject Plant List

Genome Doubling and Angiosperm Diversification

Project Contact: Doug Soltis, University of Florida

Fig. 1. Simplified summary tree for angiosperms (following the general topology of D. Soltis et al., 2000, with modifications reflecting more recent analyses, including Jansen et al., 2007), depicting putative locations of genome duplication events now inferred for flowering plants relative to major lineages or species with sequenced nuclear genomes or substantial expressed sequence tag data (taken from Soltis et al., 2009).

Polyploidy has long been recognized as a major force in angiosperm evolution. Recent genomic investigations employing sequenced genomes and large EST collections not only indicate that polyploidy is ubiquitous among angiosperms, but also suggest several ancient genome-doubling events. These include ancient whole genome duplication (WGD) events in basal angiosperm lineages, as well as a proposed paleohexaploid event that may have occurred close to the eudicot divergence (Fig. 1). The question is no longer “what proportion of angiosperms are polyploid?” but “how many episodes of polyploidy characterize any given lineage?” We will address two basic questions in this proposal.

I. How widespread is ancient polyploidy in the angiosperms and how many episodes of polyploidy characterize any given lineage? By using large collections of ESTs, investigators have been able to propose ancient duplication events for various angiosperm lineages, but the data remain taxonomically sparse. To increase phylogenetic breadth, we propose that over 550 phylogenetically pivotal angiosperm species and a suite of gymnosperm outgroups be included as part of this initiative.

The most common approach for identifying ancient polyploidy events is based on interpreting Ks plots (e.g., Blanc and Wolfe, 2004; Cui et al., 2006), which display the distribution of pairwise distances between paralogous genes within a single taxon. However, this approach can be difficult to interpret and is not always reliable (see Blanc and Wolfe, 2004; Patterson et al., 2004). The data generated from this initiative will allow us to complement Ks analyses with a phylogenetic gene tree / species tree reconciliation approach that seeks to directly place large-scale duplication events by mapping gene duplications to a species tree (e.g., Guigó et al., 1996; Bowers et al., 2003; Burleigh et al., 2008). This work will take advantage of recent algorithmic advances as well as research currently being conducted in part at the University of Florida.

Given a rooted gene tree and species tree, there are several ways to define the possible location(s), or mapping, of a gene duplication on a species tree (e.g., Guigó et al. 1996; Fellows et al., 1998). There is often a range of possible mappings for each duplication, and therefore, the number of possible mappings for the set of all duplications can be exponentially large in the size of the gene trees. We will use algorithmic approaches that seek to find large-scale duplication events by identifying a mapping that minimizes the overall number of gene duplication events. For example, Burleigh et al. (2008) described an efficient algorithmic to identify a mapping that minimizes the overall number of nodes in the species tree with gene duplication events. With only 75 gene trees representing 136 plant taxa, this algorithm identified a mapping in which the placement of the largest duplication events (in terms of number of gene trees with duplications) corresponded largely to results from Ks plots. Bansal and Eulenstein (2008) described an algorithm that finds the mapping that minimizes the total number of gene duplication episodes that can explain all duplications, and Burleigh et al. (in review) described how to determine the size (in gene duplications) of the largest possible episode at each node in a species tree from this mapping. Preliminary analyses using this approach to identify ancient plant polyploidy events appear very promising (Burleigh, unpublished).

There are still several issues with the gene tree species tree mapping approach. For example, error in the gene trees tends to result in over-estimating duplications near the root of the species tree (Hahn, 2007). However, with this approach in conjunction with Ks analyses, and adequate taxon sampling for genomic data, we feel that the history of genome doubling throughout angiosperm history can be largely resolved.

II. Does genome-doubling lead to species richness? Comparisons of diversification rates suggest that genome doubling is associated with a dramatic increase in species richness in several angiosperm lineages, including Poaceae, Solanaceae, Fabaceae, and Brassicaceae (Soltis et al., 2008). However, genomic studies incorporating further taxon sampling is needed to pinpoint the exact phylogenetic placement of the ancient polyploidy events within these lineages. With the data from this project, we will be able to test conclusively hypotheses regarding the association of ancient polyploidy events and diversification rates. As a first approach, we will follow the general method used in Soltis et al. (2008); we will compare species richness in clades that are ancient polyploids with sister clades that are not. The overall diversification rate (r) for angiosperms will be estimated based on the methods described in Magallón and Sanderson (2001). Since the estimation of this parameter is contingent on the rate of extinction, which is an unknown, we will estimate r across a range of extinction rates (Alfaro et al., 2007). We will also calculate values of r over a range of plausible age estimates for crown group angiosperms (see Soltis et al. 2008). Next we will calculated the probability of observing the extant number of species in several putative clades of polyploid origin given these estimated global rates for angiosperms, conditioned on the assumed age of the crown group. All calculations will be done using GEIGER 1.0–91 (Harmon et al. 2008).

Importantly, these analyses of diversification are also the first step in determining when and which novel genes resulting from polyploidy have enabled adaptive radiations.

Literature Cited

1. Alfaro, M. E., F. Santini, and C. D. Brock. 2007. Do reefs drive diversification in marine teleosts? Evidence from the pufferfish and their allies (Order Tetraodontiformes). Evolution 61: 2104-2126.

2. Bansal, M. S., and O. Eulenstein. 2008. The multiple gene duplication problem revisited. Bioinformatics 24: i132-i138.

3. Blanc, G., and K. H. Wolfe. 2004. Widespread paleopolyploidy in model plant species

4. Bowers, J. E., B. A. Chapman, J. Rong, and A. H. Paterson. 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433-438.

5. Burleigh, J. G., M. S. Bansal, A. Wehe, and O. Eulenstein. 2008. Locating multiple gene duplications through reconciled trees. RECOMB 2008, LNCS 4955: 273-284.

6. Burleigh, J. G., M. S. Bansal, O. Eulenstein, and T. J. Vision. Inferring species trees using gene duplication episodes. In review.

7. Cui, L., P. K. Wall, J. Leebens-Mack, B. G. Lindsay, D. E. Soltis, J. J. Doyle, P. S. Soltis, J. Carlson, A. Arumuganathan, A. Barakat, V. Albert, H. Ma, and C. W. DePamphilis. 2006. Widespread genome duplications throughout the history of flowering plants. Genome Research 16: 738-749.

8. Fellows, M., M. Hallet, and U. Stege. 1998. On the multiple gene duplication problem. ISAAC’98, LNCS 1533: 347-356.

9. Guigó, R., I. Muchnik, and T. F. Smith. 1996. Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 6: 189–213.

10. Hahn, M. 2007. Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol. 8: R141.

11. Harmon, L. J. , J. T. Weir, C. D. Brock, R. E. Glor, and W. Challenger. 2008. GEIGER: investigating evolutionary radiations. Bioinformatics 24: 129-131.

12. Jansen, R.K., Z. Cai, L. A. Raubeson, H. Daniell, C. W. dePamphilis, J. Leebens-Mack, Kai F. Müller, S.-B. Lee, R. Peery, J. McNeal, J.V. Kuehl, and J. L. Boore. 2007. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proceedings of the National Academy of Sciences, USA 104: 19369-19374.

13. Magallon, S. and M. J. Sanderson. 2001. Absolute diversification rates in angiosperm clades. Evolution 55: 1762-1780.

14. Paterson, A. H., J. E. Bowers, and B. A. Chapman. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proceedings of the National Academy of Sciences USA 101: 9903-9908.

15. Soltis, D. E., V. A. Albert, J. Leebens-Mack, C. D. Bell, A. Paterson, C. Zheng, D. Sankoff, P. Kerr Wall, and P. S. Soltis. 2009. Polyploidy and angiosperm diversification. American Journal of Botany 96: 336-348.

16. Soltis, D. E., P. S. Soltis, M. W. Chase et al. 2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Botanical Journal Linnean Society 133: 381-461.