Community resemblance

[Current editors: David K. Roberts, Miquel De Cáceres]

Introduction

Within the context of vegetation classification with numerical procedures there is often an implicit or explicit choice of how resemblance between plot records is assessed. Therefore, it is crucial to be aware of what are the choices in resemblance calculation and how each of them influences the final resemblance values. The current section reviews three aspects of community resemblance: data transformations, species weighting/selection and resemblance measures.

In general, when comparing sampling units in a multivariate space (i.e. where several attributes have been measured) the following questions are important (Wildi 2010):

    • Are the attributes of the same type?
    • Do the attributes have the same importance? Is that ok for the purpose in mind?
    • Are the attributes measured on the same scale?
    • Are some attributes correlated and therefore partly carrying the same information?

In the case of community composition, the attributes (i.e. species) are all of the same type and are usually measured using the same scale (e.g. the Braun-Blanquet cover-abundance scale). However, there are still some decisions to be made:

    • The scale used for measuring species performance may not necessarily be the best option for vegetation classification purposes. An important question related to scale is: how do differences in abundance compare to the difference between presence and absence?
    • Even if the whole community is taken into account, are all the species equally important? or one should give more weight to some subset of species?

Resemblance

The primary principle about resemblance is that it is often defined, not measured. In general, the concept of "similar" (or "dissimilar") expresses the degree to which two objects have attributes in common, compared to the extent to which they are distinct (or conversely). For community ecology generally the objects are samples of the composition of ecological communities, and the attributes are the presence or abundances of organisms. Similarity depends on the set of attributes chosen to characterize these objects. For community ecology generally the objects are samples of the composition of ecological communities, and the attributes are the presence or abundances of organisms. Here we have the advantage that all the attributes are tangible and that the list of attributes appears reasonably objective. However, the decision to include or exclude specic taxa, for example bryophytes in a sample of vegetation, often depends on the expertise of the field taxonomist as much as on the objectives of the work. Second, similarity depends on the scale of measurement chosen for each attribute, which in principle may be different for each attribute.

Cover-abundance scales

One of the decisions to make when studying the species present in a plant community regards how to assess species abundance. One could, at first, think that the number of individuals, biomass or cover are abundance estimates that do not differ greatly. However, there are species with a large number of individuals and low cover or biomass, and conversely, sometimes a few individuals have a large cover value. The reader is referred to Kent (2011) for a broader discussion on the most frequently used methods used to assess abundance in vegetation science. In order to save efforts, and because classification of vegetation requires to sample a large number of plant communities, plant abundance is often "measured" using ordinal scales of abundance and/or cover (see, for example, Wethoff & van der Maarel 1973). These scales are less precise and more subjective than other methods, but allow much faster assessment of abundance. As these scales are ordinal, recorded values need to be transformed into numerical values for numerical analyses. At this point, note that Podani (2005) argues against the use of methods, that are defined for interval or ratio variables, on data that is measured using ordinal scales.

Data transformations

Transformation involves mapping the values for a set of abundances from one scale to another, often according to a mathematical function, but sometimes

according to a set of rules. The choice of data transformations is strongly related to the choice of similarity/dissimilarity indices, because some transformations make two different indices equal. Another way of saying the same is the following: some similarity/dissimilarity indices carry in-built transformations (Faith et al. 1987).

Scalar transformations

We call scalar transformations, those that are applied to each element of the species-by-community matrix independently. Useful scalar transformations in vegetation classification (and community ecology in general) are generally non-linear and serve to emphasize dierences in low values and de-emphasize small dierences in large values. Common mathematical functions employed in abundance transformation include square root and logarithmic transformations. Logarithmic functions are stronger than square roots, but suffer from the problem that log(0) is not defined. Generally, this problem can be solved by the addition of a small constant to all values. If the abundances are counts or percents, where the smallest non-zero value is one, it makes sense to add the constant one, i.e. log(xik + 1) for all values. In this case, absences are transformed to zero log(0+1) = 0, and we get a natural scale.

Vector transformations

We call vector transformations those that transform each element of the species-by-community matrix using quantities that are calculated for the corresponding row or column.

    • Species maximum standardization - In this transformation, we divide the values of each column (i.e. the values of each species) by its maximum. The species maximum standardization has the effect of making the maximum value of all species equal to one. Since the minimum value of most species is zero, this transformation has the effect of making the scale of abundance relative for each species. In general, the species maximum standardization reduces the weight of dominant species and increases the weight of less abundant species (van der Maarel, 1979). This transformation makes the abundance values for taxa in a sample dependent on the other samples in the data set.
    • Sample total standardization - In this transformation we divide the values of each row (i.e. the values of each vegetation observation) by its maximum. The sample total standardization has the effect of eliminating differences in total abundance from the data set. One possible benefit is that it eliminates possible observer bias in estimating abundance. Since the sample total standardization fixes the total abundance in samples, it also fixes the maximum possible geometric distance between samples.
    • Wisconsin double standardization - In this transformation, the abundance values are first standardized by species maximum standardization, and then by sample total standardization, and by convention multiplied by 100. Bray and Curtis (1957) employed a double standardization before ordination. In their study, tree species were measured on different scales than were shrubs and herbs (density and basal area for trees, frequency for herbs and shrubs), so that a species maximum standardization achieved a common scale. Their rationalization for the subsequent sample total standardization was that not all samples had the same number of measurements, and that the stand total standardizations achieved a more uniform basis for comparison.
    • Sample normalization - In this transformation we divide the values of each row (i.e. the values of each vegetation observation) by the length of the row vector (i.e. its norm) (see Legendre & Gallagher 2001). This transformation is related to the Chord and Hellinger distances. Like the sample total standardization, the sample normalization also fixes the maximum possible distance between samples.

Species weighting and selection

It is common practice among many ecologists to delete rare species from data sets before calculating dissimilarity or distance. The argument is that species with few occurrences have low information content and statistical power, and that including them adds noise to the dissimilarities/distances. However, the meaning or rarity is data set dependent, and simple rules of thumb are often problematic.

Resemblance measures

We cannot review here all the possible indices or metrics that have been proposed for assessing the resemblance between pairs of communities. For those seeking a catalog of possible choices, Legendre & Legendre (2012) presents a long list of indices that could be employed in community ecology.

Similarity, dissimilarity and distance

[YET TO BE DONE]

Indices for presence/absence data

[YET TO BE DONE]

Indices for quantitative data

[YET TO BE DONE]

Bibliography

    • Bloom, S. (1981). Similarity indices in community studies: potential pitfalls. Mar. Ecol. Prog. Ser, 5, 125–128.
    • Campbell, B.M. (1978). Similarity coefficients for classifying relevés. Vegetatio, 37, 101–109.
    • Faith, D.P., Minchin, P.R. & Belbin, L. (1987). Compositional dissimilarity as a robust measure of ecological distance. Vegetatio, 69, 57–68.
    • Hajdu, L.J. (1981). Graphical comparison of resemblance measures in phytosociology. Vegetatio, 48, 47–59.
    • Kent, M. (2012). Vegetation Description and Data Analysis: A Practical Approach. 2nd editio. Wiley-Blackwell.
    • Legendre, P. & Gallagher, E. (2001). Ecologically meaningful transformations for ordination of species data. Oecologia, 129, 271–280.
    • Legendre, P. & Legendre, L. (2012). Numerical Ecology. 3rd Englis. Elsevier Science BV., Amsterdam, NL.
    • Noest, V., Maarel, E., Meulen, F. & Laan, D. (1989). Optimum-transformation of plant species cover-abundance values. Vegetatio, 83, 167–178.
    • Podani, J. (2005). Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions. Journal of Vegetation Science, 16, 497–510.
    • van der Maarel, E. (1979). Transformation of cover-abundance values in phytosociology and its effects on community similarity. Vegetatio, 39, 97–114.
    • Wildi, O (2010). Data analysis in vegetation ecology. Wiley-Blackwell.