Hierarchical cluster analysis
The main idea...
Hierarchical cluster analysis is an algorithmic approach to find discrete groups with varying degrees of (dis)similarity in a data set represented by a (dis)similarity matrix. These groups are hierarchically organised as the algorithms proceed and may be presented as a dendrogram (Figure 1). Many of these algorithms are greedy (i.e. the optimal local solution is always taken in the hope of finding an optimal global solution) and heuristic, requiring the results of cluster analysis to be evaluated for stability by, for example, bootstrapping procedures.
In ecological analyses, hierarchical clustering may be useful in discretising largely continuous ecological phenomena to aid structure detection and hypothesis generation. If data were collected along a gradient, for example, cluster analysis may help to identify (relatively) distinct regions therein which may correspond to an ecologically meaningful grouping. As always, one must consider carefully whether clustering is suitable and meaningful for the task at hand. If a very smooth response gradient (i.e. very even changes in (dis)similarity between objects) is being scrutinised, ordination procedures may be more appropriate. For further discussion, see Legendre and Legendre (1998).
Figure 1: Schematics of dendrograms representing the results of hierarchical cluster analyses. The position of branch points along a similarity axis indicate the level of shared similarity between clusters and/or individual entitites. These points indicate at what stage clusters were formed during the course of the chosen algorithm. In panel (b), there is evidence of "chaining", or sequential joining of individual entities.
Clustering methods
Agglomerative clustering is a widespread approach to cluster analysis. Agglomerative algorithms successively merge individual entities and clusters that have the highest similarity values. These typically use a linkage criterion (examples below and in Figure 2) to determine the (dis)similarity values for clusters formed during the algorithm's execution. These are valid both for the clustering of objects and of variables provided some measure of (dis)similarity; however, note other clustering techniques may not share this flexibility. Agglomerative algorithms end when all the individual entities and clusters have been merged into a single cluster. Below some common linkage criteria are summarised.
Figure 2: Examples of rules employed by hierarchical clustering algorithms to create new clusters. a) Single-linkage or nearest-neighbour clustering merges clusters based on the (dis)similarities between their nearest members. b) Complete or farthest-neighbour clustering takes into account the dissimilarities between the most distant members of a cluster. c) Average-linkage clustering, also called the unweighted pair-group method using arithmetic averages (UPGMA), defines the (dis)similarity between clusters as the average pair-wise distance between all their members. Other methods of clustering are available including those which cluster based on minimising information loss on merging clusters (e.g. Ward's method) or probabilistic measures evaluating cluster membership with reference to statistical distributions (e.g. V-type clustering) as well as fuzzy clustering approaches which allow for more relaxed group membership.
Warnings
Hierarchical clustering methods can become computationally intensive for large data sets and bootstrapping results may be prohibitively expensive. Implementations of clustering algorithms optimised for efficiency are available for large data sets (See Olson, 1995; Day, 1984; Defrays, 1977; Sibson, 1973)
If more than one approach seems appropriate to analysing the data at hand, each should be used and the results compared. Evaluating the differences in these results with an understanding of each method's rationale can often reveal interesting relationships between objects and variables. Clusters that are repeatedly found by the methods chosen may be considered robust.
Many hierarchical clustering approaches are sensitive to outliers.
Being greedy, heuristic algorithms, mis-groupings that occur at early stages are not corrected at later stages. Many implementations support bootstrapping or other resampling techniques to assess the stability of a clustering solution and suggest a consensus grouping.
Be aware of caveats and known issues associated with the wide range of clustering methods available. Some may have profound consequences on interpretability.
Walkthroughs featuring canonical correspondence analysis
Implementations
The hclust() function in R's stats package
The pvclust() in the pvclust package; further, this package allows bootstrapping and indication of highly supported clusters with pvrect().
MASAME hierarchical clustering app
References
Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.
Defays D (1977) An efficient algorithm for a complete link method. Comput J. 20: 364–366.
Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J. 16(1): 30–34.
Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classification. 1(1): 7–24.
Olson CF (1995) Parallel algorithms for hierarchical clustering. Parallel Comput. 21(8):1313–1325.
Ward JH (1963) Hierarchical Grouping to Optimize an Objective Function. J Am Statist Assoc. 48:236–244.