Sampling methods

[Current editors: Miquel De Cáceres, Susan Wiser]

Sampling design

Sampling limits the scale of variation addressed in a study by fixing the extent (i.e. the dimension of the study area in space or time), the grain (i.e. the dimension of each sampling unit, e.g. the size of a vegetation plot) and the sampling interval (or lag between sampling units). Normally, only patterns or scales broader than the grain and finer than the extent can be identified in data anlyses.

Preferential sampling

In the Braun-Blanquet approach, the samples (i.e. relevés) were selected arbitrarily in the field to find the most “typical” example, the data were heuristically ordered to construct the types. This has been critizised as being too subjective. Most of the historical phytosociological data on vegetation composition have been sampled preferentially and thus belong to those ecological data that do not fulfill the statistical assumption of independence of observations, necessary for valid statistical testing and inference (Lájer 2007).

Simple random sampling

Random sampling designs eliminate systematic error, but the sampling intensities needed for reliable landscape-scale ecological studies are usually high. Furthemore, random sampling has to be based on a sampling space, which can be defined on a geographical or environmental basis, or both.

Stratified random sampling

In the near future ecologists will not be able to replace the preferentially sampled phytosociological data in large-scale studies. At the same time, phytosociological databases have to be complemented with relevés of vegetation composed mostly of common and generalist species, which are under-represented in historical data. Stratified random sampling seems to be a suitable tool for doing this. As an example, we describe the approach taken by Grabherr et al. (2003). In a first step the total area is stratified according to ecoregion classification (regions are defined as ecogeographically homogenous areas, characterized by soil types and macroclimatic conditions, as well as uniform geomorphology and forest history). As second step these ecoregions are subdivided into homogenous strata using GIS techniques. The following criteria were taken into consideration: altitude, exposition and climate. Intersection of all spatial layers (forest ecoregions, altitudinale zones, exposition classes, climate types) resulted in strata representing all realized combinations of these environmental factors. However, only some strata are large enough to include one or more plot records. A stratum does not necessarily form a connected area and is independent of other strata.

Resampling vegetation databases

Phytosociological databases often contain unbalanced samples of real vegetation, which should be carefully resampled before any analyses. Knollová et al. (2005) proposed several methods for the stratified resampling of phytosociological databases. Some of these methods divide the database into groups according to the environmental variables that influence (or are supposed to influence) the between-plot variation in species composition. The strata can also be defined based on the geographical position of individual plots. Knollová et al. (2005) demonstrated that vegetation classification results depend on the method used for stratification.

A more straightorward and less subjective way of database resampling might be based directly on vegetation properties rather than environmental variables. Knollová et al. (2005) proposed two methods to define strata based on species composition, and a resampling method based on between plot dissimilarity in species composition was proposed by De Cáceres et al. (2008). Lengyel et al. (2011) proposed a new resampling method based on species composition, called Heterogeneity Constrained Random (HCR) resampling. The essentials of the method are as follows. Many subsets of the source vegetation database are selected randomly. These subsets are sorted by decreasing mean dissimilarity between pairs of the vegetation plots, and then sorted again by increasing variance of these dissimilarities. Ranks from both sortings are summed for each subset, and the subset with the lowest summed rank is considered as the most representative. The method is available in the JUICE software.

Plot size

Plot sites in the field are positioned in vegetation stands that are relatively homogeneous in terms of structure, species composition, and environment, so that variation is minimized within and maximized between plots. Phytosociological plots are usually squares or rectangles, which, as a rule of thumb, are roughly as large in square meters as the vegetation is high in decimeters (Dengler et al. 2008). Despite this rule and other suggestions in textbooks, actual plot sizes used may span more than one order of magnitude within the same vegetation type. Standardization of plot sizes is hindered by the vague and misleading concept of ‘minimal area’, which is thought to be a certain plot size specific for each vegetation type, beyond which any further enlargement has negligible effects on species richness and composition. However, plot size strongly influences estimates of species richness and other vegetation parameters (Dengler 2009).

Plant abundance assessments

Discussion

Plot locations and stratification

There has been an ongoing debate about the merits of objectively located plots for classification purposes. Rolecek et al (2007) argue that while simple random sampling, systematic sampling and stratified random sampling better meet some of the statistical assumptions, preferential sampling yields data sets that cover a broader range of vegetation variability. As a result, random or systematic sampling may fail to allow rare types to be identified. Recently, Michalcová et al. (2011) have compared stratified-random vs. preferential sampling, indicating that some analyses may be biased in preferential sampling, whereas others not.

Some authors prefer sampling proportional to area (i.e. geographically stratified) rather than environmental stratification (e.g. Cooper et al. 2006).

Issues derived from merging datasets with different characteristics

Bibliography

    • Cooper, A., McCann, T. & Bunce, R. (2006) The influence of sampling intensity on vegetation classification and the implications for environmental management. Environmental Conservation, 33, 118-127.
    • De Cáceres, M., Font, X. & Oliva, F. (2008) Assessing species diagnostic value in large data sets: a comparison between phi coefficient and Ochiai index. Journal of Vegetation Science, 19, 779-788.
    • Dengler, J., Chytry, M. & Ewald, J. (2008) Phytosociology. Encyclopedia of Ecology (eds S. E. Jørgensen & B. D. Fath), pp. 2767-2779. Elsevier, Oxford.
    • Dengler, J., Löbel, S. & Dolnik, C. (2009) Species constancy depends on plot size - a problem for vegetation classification and how it can be solved. Journal of Vegetation Science, 20, 754-766.
    • Grabherr, G., Reiter, K. & Willner, W. (2003) Towards objectivity in vegetation classification: the example of the Austrian forests. Plant Ecology, 169, 21-34.
    • Lájer, K. (2007). Statistical tests as inappropriate tools for data analysis performed on non-random samples of plant communities. Folia Geobotanica, 42, 115–122.
    • Lengyel, A., Chytrý, M. & Tichý, L. (2011) Heterogeneity-constrained random resampling of phytosociological databases. Journal of Vegetation Science, 22, 175-183.
    • Knollová, I., Chytry, M., Tichy, L. & Hajek, O. (2005) Stratified resampling of phytosociological databases: some strategies for obtaining more representative data sets for classification studies. Journal of Vegetation Science, 16, 479-486.
    • Michalcová, D., Lvončík, S., Chytrý, M. & Hájek, O. (2011) Bias in vegetation databases? A comparison of stratified-random and preferential sampling. Journal of Vegetation Science 22, 281-291.
    • Rolecek, J., Chytry, M., Háyek, M., Lvoncik, S. & Tichý, L. (2007) Sampling in large-scale vegetation studies: Do not sacrifice ecological thinking to statistical puritanism. Folia Geobotanica, 42, 199-208.