[Current editors: David K. Roberts, Miquel De Cáceres]
Within the context of vegetation classification with numerical procedures there is often an implicit or explicit choice of how resemblance between plot records is assessed. Therefore, it is crucial to be aware of what are the choices in resemblance calculation and how each of them influences the final resemblance values. The current section reviews three aspects of community resemblance: data transformations, species weighting/selection and resemblance measures.
In general, when comparing sampling units in a multivariate space (i.e. where several attributes have been measured) the following questions are important (Wildi 2010):
In the case of community composition, the attributes (i.e. species) are all of the same type and are usually measured using the same scale (e.g. the Braun-Blanquet cover-abundance scale). However, there are still some decisions to be made:
The primary principle about resemblance is that it is often defined, not measured. In general, the concept of "similar" (or "dissimilar") expresses the degree to which two objects have attributes in common, compared to the extent to which they are distinct (or conversely). For community ecology generally the objects are samples of the composition of ecological communities, and the attributes are the presence or abundances of organisms. Similarity depends on the set of attributes chosen to characterize these objects. For community ecology generally the objects are samples of the composition of ecological communities, and the attributes are the presence or abundances of organisms. Here we have the advantage that all the attributes are tangible and that the list of attributes appears reasonably objective. However, the decision to include or exclude specic taxa, for example bryophytes in a sample of vegetation, often depends on the expertise of the field taxonomist as much as on the objectives of the work. Second, similarity depends on the scale of measurement chosen for each attribute, which in principle may be different for each attribute.
One of the decisions to make when studying the species present in a plant community regards how to assess species abundance. One could, at first, think that the number of individuals, biomass or cover are abundance estimates that do not differ greatly. However, there are species with a large number of individuals and low cover or biomass, and conversely, sometimes a few individuals have a large cover value. The reader is referred to Kent (2011) for a broader discussion on the most frequently used methods used to assess abundance in vegetation science. In order to save efforts, and because classification of vegetation requires to sample a large number of plant communities, plant abundance is often "measured" using ordinal scales of abundance and/or cover (see, for example, Wethoff & van der Maarel 1973). These scales are less precise and more subjective than other methods, but allow much faster assessment of abundance. As these scales are ordinal, recorded values need to be transformed into numerical values for numerical analyses. At this point, note that Podani (2005) argues against the use of methods, that are defined for interval or ratio variables, on data that is measured using ordinal scales.
Transformation involves mapping the values for a set of abundances from one scale to another, often according to a mathematical function, but sometimes
according to a set of rules. The choice of data transformations is strongly related to the choice of similarity/dissimilarity indices, because some transformations make two different indices equal. Another way of saying the same is the following: some similarity/dissimilarity indices carry in-built transformations (Faith et al. 1987).
We call scalar transformations, those that are applied to each element of the species-by-community matrix independently. Useful scalar transformations in vegetation classification (and community ecology in general) are generally non-linear and serve to emphasize dierences in low values and de-emphasize small dierences in large values. Common mathematical functions employed in abundance transformation include square root and logarithmic transformations. Logarithmic functions are stronger than square roots, but suffer from the problem that log(0) is not defined. Generally, this problem can be solved by the addition of a small constant to all values. If the abundances are counts or percents, where the smallest non-zero value is one, it makes sense to add the constant one, i.e. log(xik + 1) for all values. In this case, absences are transformed to zero log(0+1) = 0, and we get a natural scale.
We call vector transformations those that transform each element of the species-by-community matrix using quantities that are calculated for the corresponding row or column.
It is common practice among many ecologists to delete rare species from data sets before calculating dissimilarity or distance. The argument is that species with few occurrences have low information content and statistical power, and that including them adds noise to the dissimilarities/distances. However, the meaning or rarity is data set dependent, and simple rules of thumb are often problematic.
We cannot review here all the possible indices or metrics that have been proposed for assessing the resemblance between pairs of communities. For those seeking a catalog of possible choices, Legendre & Legendre (2012) presents a long list of indices that could be employed in community ecology.
[YET TO BE DONE]
[YET TO BE DONE]
[YET TO BE DONE]