Principal coordinates analysis

The main idea...

Principal coordinates analysis (PCoA; also known as metric multidimensional scaling) summarises and attempts to represent inter-object (dis)similarity in a low-dimensional, Euclidean space (Figure 1; Gower, 1966). Rather than using raw data, PCoA takes a (dis)similarity matrix as input (Figure 1a).

It is conceptually similar to principal components analysis (PCA) and correspondence analysis (CA) which preserve Euclidean and χ2 (chi-squared) distances between objects, respectively; however, PCoA can preserve distances generated from any (dis)similarity measure allowing more flexible handling of complex ecological data.

Additionally, (dis)similarity matrices calculated from quantitative, semi-quantitative, qualitative, and mixed variables can be handled by PCoA. As always, the choice of (dis)similarity measure is critical and must be suitable to the data in question. The choice of measure will also, together with the number of input variables, determine the number of dimensions that comprise the PCoA solution. As an important caveat, be aware that PCoA can only fully represent Euclidean components of the matrix even if the matrix contains non-Euclidean distances. To arrive at a fully Euclidean solution, consider non-metric multidimensional scaling (NMDS) or using data transformations.

Figure 1: Principal coordinate analysis ordination of a Bray-Curtis dissimilarity matrix. Objects that are ordinated closer together have smaller dissimilarity values than those ordinated further apart. A successful PCoA will capture most of the variation in the (dis)similarity matrix in a few PCoA axes.

Results and interpretation

As with other ordination techniques such as PCA and CA, PCoA produces a set of uncorrelated (orthogonal) axes to summarise the variability in the data set. Each axis has an eigenvalue whose magnitude indicates the amount of variation captured in that axis.The proportion of a given eigenvalue to the sum of all eigenvalues reveals the relative 'importance' of each axis. A successful PCoA will generate a few (2-3) axes with relatively large eigenvalues, capturing above 50% of the variation in the input data, with all other axes having small eigenvalues. Each object has a 'score' along each axis. The object scores provide the object coordinates in the ordination plot.

Interpretation of a PCoA plot is straightforward: objects ordinated closer to one another are more similar than those ordinated further away. (Dis)similarity is defined by the measure used in the construction of the (dis)similarity matrix used as input.

While PCoA is suited to handling a wide range of data, information concerning the original variables cannot be recovered. This is because PCoA takes a (dis)similarity matrix derived from the original data as input and not the original variables themselves. However, object scores along the PCoA axes may be correlated with object scores along each original variable's axis, assuming the these are either quantitative or dummy variables (Legendre & Legendre, 1998). This may be used as a measure of the original variables' contribution to a given PCoA axis.

Warnings

If a PCoA axis has a negative eigenvalue associated with it, imaginary numbers are generated during the analysis and prevent Euclidean representation. Such eigenvalues may arise when using certain (dis)similarity measures that are either semi- or non-metric or those that exhibit other forms of non-Euclideanarity. To correct for these, transformations of the original data are needed which aim at making small dissimilarities larger relative to large dissimilarities. Taking the square root of dissimilarities or adding a constant to all dissimilarities sufficient to remove negative eigenvalues are viable options (Legendre and Legendre 1998).
Objects (rows) that have variable values that introduce large amounts of variation to the overall data set may strongly influence the ordination, making patterns of other objects less visible. It may be instructive to examine a PCoA solution that excludes such objects.
The values of the objects along a PCoA axis of interest may be correlated (using an appropriate measure) with those of environmental variables to assess association. However, PCoA is a form of indirect gradient analysis; therefore, other methods, such as distance-based redundancy analysis (db-RDA), are likely to offer more utility in assessing the influence of environmental variables.

Implementations

R
- the function cmdscale() is called by the package vegan and performs PCoA on a (dis)similarity or distance matrix (such as those generated by vegan's vegdist() function). The ordiplot() function (also from vegan) may be used to plot the ordination.
- the function pcoa() in the package ape. The results may be plotted with the biplot.pcoa() function.

MASAME PCoA app

Click here to launch...

References

Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika. 53(3-4):325-338.
Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.

Page updated

Google Sites

Report abuse