Assignment methods
[Current editors: Miquel De Cáceres]
Introduction
Once vegetation types have been defined, they are ready to be used to describe/summarize vegetation patterns. When surveying/monitoring the vegetation existing in a given territory, new vegetation observations (i.e. relevés or plot records) will be made. Classifying these observations into a predefined vegetation classification scheme may be useful for many applied purposes, such as for vegetation mapping or for assessing the conservation status of the plant community. We will use here the word assignment to refer to the act of determining the membership of a given vegetation observation into a classification scheme (i.e. to determine its closest vegetation type, if any). We devote this section to the methods used to perform such assignments.
Membership rules
In order to assign vegetation observations to pre-existing vegetation types one needs a procedure, which we call here membership rule. Simple examples of membership rules are: “a plot record belongs to vegetation type Y if its altitude lies within a given altitudinal range” or “a plot record belongs to vegetation type Y if species A occurs among the list of species recorded”. Examples of more complex rules would be a hierarchical vegetation key (e.g. Pfister & Arno 1980), a set of species groups plus a formal logic statement combining them (Bruelheide 1997, 2000), a fuzzy membership function based on the distances of the target observation to a set of cluster prototypes (e.g. De Cáceres et al. 2009, 2010) or a trained neural network (e.g. Černá & Chytrý 2005). Moreover, two or more membership rules may be combined to create a compound rule (e.g. Kočí et al. 2003). Membership rules are sometimes implemented into computer programs that allow automatization of the assignment process (i.e. expert systems).
There are three essential ways to obtain a membership rule:
- Expert-based rule definition: Vegetation scientists sometimes define membership rules using their expertise, without any explicit use of, or reference to, vegetation observations.This is often the case for global-level vegetation classifications based on climate, physiognomy and/or structure.
- Unsupervised classification: Unsupervised classification (or clustering) methods allow vegetation observations to be grouped and hence produce membership statements. Clustering methods differ in the criterion they use to group observations. Although it is not their primary aim, some clustering methods also provide membership rules.When clustering methods provide membership rules, the assignments using these rules have the advantage of being consistent with the membership statements that the method produces for the input data set (De Cáceres & Wiser in press).
- Supervised classification: Supervised classification methods take the membership statements and the attributes of a set of ‘training’ vegetation observations and define a membership rule. In either case, a classifier is developed from a training dataset of plot observations, whose classification is already known and assumed to be valid (e.g. Černá & Chytrý 2005; van Tongeren et al. 2008). Such assumption can be a source of problems in expert domains where it does not hold true, or when there is no consensus on the classification of the training set. Traditional expert-based vegetation classifications usually suffer from several inconsistencies (i.e. different researchers used variable and sometimes not explicitly stated classification criteria) and/or contain loosely defined units (i.e. plant communities defined by the occurrence, dominance or absence of particular species). Under this scenario, supervised classification methods may spread potentially incorrect information if traditional expert-defined classifications are not previously validated using a common classification criterion.
Assignment using diagnostic species
What is diagnostic species analysis?
In many ecological studies, researchers analyze the relationship between single species and one or more groups of sites. The classification of sites into groups (hereafter called ‘‘site groups’’) may have been derived from the similarities in environmental conditions among sites (e.g., habitat types, disturbance states), or in species composition (i.e., vegetation types); it may also have been given by the study design (e.g., comparison of geographic regions or sampling years) or obtained using other criteria. With respect to the species, the analysis of its strength of association with site groups provides a characterization of its ecological preferences. With respect to the site group, the list of species strongly associated to the site group has a lot of interest for predictive purposes.
In vegetation studies, plant species that preferably occur in a single or a few vegetation types are generally called diagnostic species, and are useful for the identification of vegetation types in field surveys. A species restricted to one or a few habitat types potentially represents a better ecological indicator of environmental change than a habitat generalist, owing to the greater susceptibility of the specialist to local or regional extinction. Species of this kind are called indicator species by ecologists, and are used to monitor environmental changes and assess the impacts of disturbances on an ecosystem. McGeoch (1998) suggested that indicator species could be used in three distinct ways: (1) to reflect the biotic or abiotic state of the environment; (2) to reveal evidence for the impact of environmental changes; and (3) to indicate the diversity of other species, taxa, or communities within an area. Both diagnostic and indicator species essentially refer to the same concept: using the preference of the species for predicting purposes.
Diagnostic/fidelity measurement
In European phytosociology, the study of species–site group associations has a long tradition (Barkman 1989). The strength of the association was called fidelity, which was defined as a measure of species concentration in vegetation units.
TWINSPAN can be used to obtain indicator species. This procedure takes a site-by-species data table and performs a hierarchical classification of the sites and, at the same time, determines the indicator species for the two sides of each split in the hierarchy. The main problem of TWINSPAN for indicator species analysis is that its multivariate nature makes the indicator value of one species dependent on the abundances of the remaining species in the data table (McGeoch and Chown 1998).
The determination of diagnostic species is still an active research topic in vegetation science (e.g., Bruelheide 2000; Chytrý et al. 2002a, 2002b; Tichý and Chytrý 2006; De Cáceres et al. 2008, Willner et al. 2009), where the most widely used index is the phi coefficient of association or modified forms of it (Chytrý et al. 2002b; Tichý and Chytrý 2006; Willner et al. 2009). In contrast, many ecologists prefer to determine indicator species using the IndVal index (Dufrêne and Legendre 1997), for which some extensions have been published (De Cáceres et al. 2010; Podani & Csányi 2010). The two approaches have been unified in a broader algebraic framework by De Cáceres & Legendre (2009).
The context in diagnostic species determination
If a taxon has an optimum in one syntaxon only, it is called a character-taxon or faithful taxon. If it has several phytosociological optima, we speak of differential taxa. However, faithful taxa are just a special, extreme case of differential taxa. Differential taxa are normally used to distinguish subassociations and variants, but there is no fundamental objection against using them to distinguish associations, alliances and even higher categories and in modern Braun-Blanquet phytosociology this is actually done. So we must discern (1) the taxonomic rank (level) of the differential taxon itself, (2) the syntaxonomic level (rank) for which it is differential and (3) the level within which it is differential.
A further issue in diagnostic species determination is how best to establish the context of fidelity determination; in other words, how to decide which relevés should make up the comparative set. The first works on the subject of fidelity did not explicitly establish the context of comparison. Not surprisingly, we find discrepancies between the lists of diagnostic species for the same community when their contexts of determination differ (Barkman 1989; Botta-Dukát & Borhidi 1999). Crucial elements to bear in mind when taking this decision are the geographical and ecological ranges (i.e. the range of communities, habitats and even formation types) of the comparative set of relevés. Chytrý et al. (2002a) recognise that two extreme approaches have traditionally been adopted by phytosociologists: either they narrow the ecological and extend the geographical range of the comparative data, or they narrow the geographical and extend the ecological range.
Membership rules based on diagnostic species
Whereas a lot of scientific literature has been devoted to the (statistical) determination of diagnostic species, much less effort has been put into how to build membership rules from such information. The simplest membership rule based on diagnostic species concerns the occurrence of a diagnostic species in a plot: “a plot record belongs to vegetation type Y if any character species occurs among the list of species recorded”. H. Bruelheide (1997) suggested more complex rules, based on formal logic statements, to combine the information carried by several species in order to make assignments. Since nowadays is very easy to calculate fidelity/indicator value indices, one can use such values to derive more complex mathematical rules. For example, Gégout & Coudun (2011) calculated for any relevé and any association the mean of Φ values of all the species present in the relevé. The relevé was then assigned to the association with the highest plot index. Gégout & Coudun (2011) also provided a function to assess the probability that such assignment was the same that the assignment that an expert would produce.
Discussion (draft)
Since contemporary vegetation scientists are increasingly using numerical clustering (i.e. unsupervised) methods to derive new vegetation units, these should also be used to review traditional classifications. However, note that current conservation policies, like those of the Natura 2000 networking programme, are based on habitat definitions (e.g. the CORINE biotopes manual), which in turn rely on traditional phytosociological units. Therefore, drastic changes in regional/national vegetation classifications can be problematic and should be avoided.
Even if traditional vegetation units are considered valid, we believe the classification criterion of supervised
classifications should be congruent with that used in the original classification of the training dataset. Otherwise, either the efficiency and/or interpretation of results may be affected. This explains why supervised approaches emulating traditional
phytosociological concepts perform better when the expert classification of the training set is used instead of that resulting from numerical clustering analyses (e.g. van Tongeren et al. 2008).
In our opinion, vegetation scientists should decide whether they would prefer: (1) a vegetation classifier designed as an interface to communicate expert vegetation knowledge to non-experts; or (2) a computer program similar to the former, but which can also promote revision of the expert knowledge itself. In the first case the program would simply run supervised classification methods from a knowledge base that would be assumed to be true. In contrast, in the second case, the system would allow doubting of the expert knowledge, and even changing the viewpoint.
Bibliography
- Barkman, J. J. (1989) Fidelity and character-species, a critical evaluation. Vegetatio, 85, 105-116.
- Botta-Dukát, Z. & Borhidi, A. (1999) New objective method for calculating fidelity. Example: The illyrian beechwoods. Annali di Botanica, 57, 73-90.
- Bruelheide, H. (1997). Using formal logic to classify vegetation. Folia Geobotanica 32: 377 41–46.
- Bruelheide, H. (2000) A new measure of fidelity and its application to defining species groups. Journal of Vegetation Science, 11, 167-178.
- Černá, L. & Chytrý, M. (2005). Supervised classification of plant communities with artificial neural networks. Journal of Vegetation Science 16: 407–414.
- Chytrý, M., Exner, A., Hrivnák, R., Ujházy, K., Valachoviè, M. & Willner, W. (2002a) Context-dependence of diagnostic species: A case study of the Central European spruce forests. Folia Geobot, 37, 403–417.
- Chytrý, M., Tichý, L., Holt, J. & Botta-Dukát, Z. (2002b) Determination of diagnostic species with statistical fidelity measures. Journal of Vegetation Science, 13, 79-90.
- De Cáceres, M., Font, X. & Oliva, F. (2008) Assessing species diagnostic value in large data sets: a comparison between phi coefficient and Ochiai index. Journal of Vegetation Science, 19, 779-788.
- De Cáceres, M., Font, X., Vicente, P. & Oliva, F. (2009) Numerical reproduction of traditional classifications and automated vegetation identification. Journal of Vegetation Science, 20, 620-628.
- De Cáceres, M., Font, X. & Oliva, F. (2010). The management of vegetation classifications with fuzzy clustering. Journal of Vegetation Science 21: 1138–1151.
- De Cáceres, M. & Legendre, L. (2009) Associations between species and groups of sites: indices and statistical inference. Ecology, 90, 3566-3574.
- De Cáceres, M., Legendre, P. & Moretti, M. (2010) Improving indicator species analysis by combining groups of sites. Oikos, 119, 167
- 4-1684.
- Gégout, J.-C. & Coudun, C. (2011) The right relevé in the right vegetation unit: a new typicality index to reproduce expert judgement with an automatic classification programme. Journal of Vegetation Science (on-line).
- Kočí, M., Chytrý, M. & Tichý, L. (2003) Formalized reproduction of an expert-based phytosociological classification: A case study of subalpine tall-forb vegetation. Journal of Vegetation Science, 14, 601-610.
- McGeogh, M. A. (1998) The selection, testing and application of terrestrial insects as bioindicators. Biological Reviews, 73, 181-201.
- McGeoch, M. A. & Chown, S. L. (1998) Scaling up the value of bioindicators. Trends in Ecology and Evolution, 13, 46-47.
- Tichý, L. & Chytrý, M. (2006) Statistical determination of diagnostic species for site groups of unequal size. Journal of Vegetation Science, 17, 809-818.
- Pfister, R.D. & Arno, S.F. (1980). Classifying Forest Habitat Types Based on Potential Climax Vegetation. Forest Science 26: 52–70.
- Podani, J. & Csányi, B. (2010) Detecting indicator species: Some extensions of the IndVal measure. Ecological Indicators, 10, 1119-1124.
- Tsiripidis, I., Bergmeier, E., Fotiadis, G. & Dimopoulos, P. (2009) A new algorithm for the determination of differential taxa. Journal of Vegetation Science, 20, 233-240.
- Willner, W., Tichý, L. & Chytrý, M. (2009) Effects of different fidelity measures and contexts on the determination of diagnostic species. Journal of Vegetation Science, 20, 130-137.