Canonical Correspondence Analysis

The main idea...

Canonical correspondence analysis (CCA) is the canonical form of correspondence analysis (CA). As a form of direct gradient analysis, wherein a matrix of explanatory variables intervenes in the calculation of the CA solution, only correspondence that can be 'explained' by the matrix of explanatory variables is represented in the final results. 

As with CA, this technique is suitable for response variables showing unimodal distributions and preserves χ2 (chi-squared) distances (click here for more information about distances) between objects. In fact it can be computed from a matrix of χ2 distances that is passed on to a form of redundancy analysis (RDA) which uses object marginal sums (row totals) as a weighting parameter. The result of this weighted RDA is that only those response variables that are maximally related to linear combinations of the explanatory variables provided are ordinated in a Euclidean space. These are then canonical variables. The correlation of the explanatory variables to the final ordination determines their 'importance'. 

Legendre and Legendre (1998) note that CCA can be used to relate a qualitative explanatory variable to unimodal response data. The qualitative variable is recoded as a dummy variable and CCA is run. The fitted site scores provide a quantitative rescaling of the qualitative explanatory variable.

Figure 1: An illustrative schematic of a CCA triplot. Filled circles represent objects (e.g. sampling sites). Hollow circles represent response variables (e.g. OTU abundances). Arrows represent quantitative explanatory variables (here, nutrient concentrations) with arrowheads indicating their direction of increase. Filled triangles represent the states of a categorical explanatory variable (e.g. sand, silt, or clay sediment type). See Figure 2 for guidance on reading a CCA triplot.

Results and interpretation

Many implementations of CCA will report the total inertia of the solution alongside the inertia that was successfully constrained by the explanatory variables. The quotient of the constrained inertia over the total inertia indicates how good the overall 'fit' was. Further, each CCA axis is associated with an eigenvalue. For constrained axes (i.e. those that are linear combinations of the explanatory variables), the eigenvalues are a fraction of the total constrained inertia. Thus, they express the amount of the constrained inertia expressed by each constrained axis.

The correlation of the canonical axes with the explanatory matrix is reported as well as the significance of each correlation determined by permutation. Significance can be tested for the overall solution or for individual ordination axes (and their eigenvalues) derived from the response data. Note that individual axes should only be examined if the overall solution was significant. Testing the hypothesised relationships between the matrix of response variables and that of explanatory variables is done by permuting one matrix a sufficient number of times to establish a null distribution.

Reading CCA triplots

As in CA, the distances between points representing objects and response variables in a CCA plot are χ2 distances and must be interpreted as such. The type of scaling used (see below and Figure 2) will determine whether object-to-object or (response) variable-to-variable distances are meaningful. In general, object-to-variable distances are not readily interpretable; however, smaller object-to-variable distances indicate the increased probability of a given variable being 'present at', 'abundant at', or otherwise influential for a given object.

Figure 2: Illustrative example of CCA triplot interpretation using a) type I scaling and b) type II scaling. a) This example focuses on two objects ("o1", "o2"), three quantitative explanatory variables ("Nitrate", "Phosphate", "Silicate") represented by vectors (arrows) pointing in the direction of increase and extended for clarity (dashed lines), and two states of a nominal (qualitative) variable, sediment type ("Sand", "Silt", "Clay"). Orthogonal projections are shown as dotted red lines. Object "o1" is very likely to be found in clay sediments while object "o2" is more likely to be found in sand sediments. Perpendicular projections of object "o1" onto quantitative explanatory variables suggests it realises high values of nitrate concentration, mid-to-low values of phosphate concentration, and low values of silicate concentration. Object "o2" realises high values of phosphate concentration, mid-range values of silicate concentration and low values of nitrate concentration. b) This example is similar to that in (a), however, points representing response variables ("v1", "v2") are now the focus of interpretation. Variable "v1" is likely to reach its maximum (e.g. highest abundance) in silty sediments at high concentrations of nitrate (projection not shown), mid-to-low concentrations of silicate, and low concentrations of phosphate.Variable "v2" is likely to reach its maximum in sandy sediments, at mid-to-high phosphate concentrations, low nitrate concentrations and high silicate concentrations.

Scaling in CCA

 Type 1

 Type 2

Type 1 scaling emphasises the relationships among objects. Thus:

Type 2 scaling emphasises the relationships among response variables. Thus:

Assumptions

Warnings

Walkthroughs featuring canonical correspondence analysis

Implementations

MASAME CCA app

    Click here to launch...

References