Principal Components Analysis

The main idea...

Principal components analysis (PCA) is a method to summarise, in a low-dimensional space, the variance in a multivariate scatter of points. In doing so, it provides an overview of linear relationships between your objects and variables. This can often act as a good starting point in multivariate data analysis by allowing you to note trends, groupings, key variables, and potential outliers. Further, if you have a data set with many variables and relatively few objects (i.e. a "large p, small n" table), PCA can help collapse these many variables into a few principal components (PCs), which can be used in further analyses.

Consider a data table with 40 objects (rows) and 5 variables (x1...x5; columns). A 5-dimensional scatter plot (i.e. a plot with 5 orthogonal axes) with each object's coordinates in the form (x1, x2, x3, x4, x5) is impossible to visualise and interpret. Roughly speaking, PCA attempts to express most of the variability in this 5-dimensional space by rotating it in such a way that a lower-dimensional representation will bring out most of the variability of the higher-dimensional space. A new set of axes (known as principal components) is created as a basis of the lower-dimensional representation. An illustration showing this for a simpler, 3-dimensional space is shown in Figure 1.

Figure 1: An intuitive sketch of PCA's aims. Panel a shows a 3-dimensional scatter plot in which the variability between the six points in box i is obscured. Panel b shows a rotation of the original axes to maximise the variability in a two dimensional space. The first principal component would be constructed in the direction of maximum scatter (i.e. maximum variability; dashed line). Subsequent PCs would be constructed in the same manner, however, must be orthogonal to (have no correlation with) all other PCs. The original variables would be rescaled as needed and may be represented in a biplot (Figure 2)

Linear combinations of the original variables are used to build the principal components (PCs). The first PC is placed through the scatter of points so as to maximise the amount of variation along it (Figure 1b). The same criterion applies to each subsequent PC calculated; however, each PC must be orthogonal to every other PC, that is, the covariance between each PC is strictly zero. If a few PCs capture most (70-90 %) of the variance in the original scatter, the PCA has been very successful in representing the variability in your data; however, ecological data sets are rarely summarised so well. Smaller amounts of total variance captured (30-40%) can also be informative. There are several methods to estimate the number of informative PCs generated by a PCA, such as the "broken stick model" and the Kaiser-Guttman criterion. See Jackson's (1993) discussion for insight into their effectiveness.

PCA thus aims to reduce the number of variables in large data sets and thereby assist interpretation. This is most often an initial step which will advise further analyses. PCs themselves can be extracted from a PCA result and used as new variables in subsequent analyses such as multiple regression. If this is done, the analyst must carefully consider what these PCs represent.

Pre-analysis

If your response variables are not dimensionally homogeneous (i.e. if they have different base units of measurement), you should standardise them using, for example, z-scoring. However, it is not advisable to standardise raw count data. If standardisation is performed, then PCA will be performed on a correlation matrix derived from the data. If no standardisation is needed, a covariance matrix will be used.
Examine the distribution of each variable as well as plots of each variable against other variables. If the relationships are markedly non-normal and non-linear, respectively, apply the appropriate transformations prior to analysis.
As far as possible, reduce the effect of outliers.
If you wish to represent non-Euclidean distances (e.g. Hellinger distances) between objects, you should apply an ecologically-motivated transformation discussed on this page before analysis.

Results and interpretation

Typical results delivered by implementations of PCA report the following results:

The total variance of the data set, that is, the total variance of all variables across all objects.
The eigenvalues associated with each PC. These are often presented as raw values and as proportions of the total variance (which is the sum of all eigenvalues). Examining the proportion of variance explained attributed to each PC is useful in determining how much variation that PC is able to 'explain'. If the first few PCs account for a large proportion of the variance in the data, PCA was successful. Note that some implementations report the percentage of total variance explained rather than proportions.
Objects and variables will have a score on each the PCs calculated. The scores act as a new set of coordinates in the space described by PC axes. Object scores describe the position of the object in the ordination. Variable scores may be understood as the "tip" of the vector representing the variable in a biplot (see below) and suggest the direction of increase of a given variable, relative to the origin of the ordination plot. Note that different implementations are likely to report different scores as they may use differing methods to create a good ordination plot.
The variable loadings may be understood as how much each variable 'contributed' to building a PC. The absolute value of the loadings should be considered as the signs are arbitrary.
Some implementations report the proportion of an object's total variance (across all variables) captured by a given PC. Objects with higher proportions of captured variance are better represented by a given PC.

Figure 2: A PCA biplot. Points represent objects (rows). Red vectors represent the original variables (columns) used to build the PCs. The interpretation of the ordination depends on the type of scaling used. See text for description.

Reading a PCA biplot

The results of a PCA analyses are typically visualised using a biplot (Figure 2). The interpretation of this biplot depends on the scaling chosen. Properties of these scalings are presented below: In general, consider type I scaling if the distances between objects are of particular value and type II scaling if the correlative relationships between variables are of more interest. Further interpretation is also discussed below and more detail is available in Legendre and Legendre (1998) and ter Braak (1994).

Figure 3: Schematics highlighting a) the projection of ordinated objects onto a vector and b) the angles between vectors. The projection of an ordinated point onto a variable vector, as shown for point i in panel a, approximates the variable's value realised for that object. Hence, visual inspection suggests object i can be expected to have higher values of variable 1 relative to most other objects. Object ii, however, can be expected to have lower values of variable 1 relative to other objects. Note that the dashed line is not typically shown in a biplot and is shown here for clarity. When using type II scaling, cosines of angles between vectors (panel b) approximate the covariance between the variables they represent. In this case, ∠a is approaching 90, which suggests that variables "1" and "2" show very little covariance (i.e. they are almost orthogonal, just as independent axes are). ∠b is less than 90, suggesting positive covariance between variables "2" and "3" while ∠c is approaching 180, suggesting strong negative covariance between variables "2" and "4" (i.e. the directions of increase of variables "2" and "4" oppose one another). Variable 5 is non-quantitative and is represented by a centroid. A right-angled projection onto variable 4 suggests the two are positively linked.

Type I Scaling - Distance biplot

Distances between object points approximate the Euclidean distances between objects. Thus, objects ordinated closer together can be expected to have similar variable values. This will not always hold true, however, as PCA only recovers part of the variation in the data set.
Right-angled projections of an object point on a variable's vector approximates the value of that variable for the chosen object
The length of a variable vector in the ordination plot reflects its contribution to the ordination. That is, variables with vectors which appear longer than others in a given ordination were more important in building the PCs used in that ordination. The contribution of a variable to a particular PC can be approximated by projecting the "tip" of the vector onto the PC of interest (Figure 3b, variable 4).
Angles between variable vectors are meaningless

Type II Scaling - Covariance/Correlation biplot

The angles between all vectors approximate their (linear) covariance/correlation. The covariance/correlation is equal to the cosine of the angle between vectors (e.g. a vector pair describing an angle of 90° are uncorrelated as cos(90) = 0), those describing an angle of 20° have strong, positive covariance/correlation as cos(20) = 0.94).
Depending on whether a covariance or correlation matrix was used during PCA, the length of vectors representing variables has a different meaning..:
- Covariance PCA: a vector's length in a given ordination approximates its associated variable's standard deviation in that ordination.
- Correlation PCA: All vectors are standardised to have variances of "1". As with type I scaling, the length of a vector reflects the contribution of its associated variable in 'building' the ordination space.
- In both covariance and correlation PCA, variable attributes with respect to a particular PC can be approximated by projecting the "tip" of the vector onto the PC of interest (Figure 3b, variable 4).
Right-angled projections of an object point on a variable's vector approximates the value of that variable for the chosen object
Distances between object points may be non-Euclidean and should not be interpreted with great confidence.

Key assumptions

Variables are linearly or at least monotonically related to one another
The variables should show a multivariate normal distribution.
When using type II scaling, covariances/correlations between variables are assumed to be linear.
Distances between objects can be represented in a Euclidean space. If non-Euclidean (dis)similiarity measures are used, appropriate transformations should be applied.

Warnings

- Different implementations may renumber or reclassify the scaling types. Read their documentation to ensure that you are using the scaling you intend to.

The horseshoe effect, indicated by a horseshoe-shaped ordination of objects, may result from variables having unimodal rather than linear relationships. Consider correspondence analysis if you observe a horseshoe-like shape described by the points in your PCA ordination.
Applying PCA to data with many zeros can lead to problematic ordinations. Consider using the Hellinger or chord transformations to linearise the relationships between variables with many zeros. If zeros are concentrated in a few variables, removing such variables should also be considered.
Strongly non-normal data. This may compromise the assurance that PCs are independent.
Objects may be better described by 'less powerful' PCs with smaller eigenvalues. While the first two PCs often summarise the data well for all objects, a particular object may be better represented by another PC. How well a PC captures the variance of a given object can be determined by calculating the proportion of the object's total variance 'explained' by the PC.
Keep in mind that the origin of a PCA biplot does not represent a "zero" value for the variables radiating from it. It is the centre of the standardised variation captured.

Implementations

R
- The rda() function in the package vegan (a redundancy analysis [RDA] performed without a matrix of explanatory variables is equivalent to a PCA)
- The prcomp() function in the package stats
- The pca() function in the package labdsv
- The PCAsignificance() function in the package BiodiversityR may be used to calculate the number of 'significant' PCs based on the broken-stick criterion. The function accepts results from vegan's rda().
- The ordiequilibriumcircle() function in the pacakge BiodiversityR may be used to draw and equilibrium circle on an ordination resulting from an rda() analysis. This is only valid for PCA ordinations using Type I scaling.

MASAME PCA app

Click here to launch...

References

Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.
Jackson DA (1993) Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches. Ecology. 74(8): 2204-2214.
Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol Ecol. 62(2): 142–160.
ter Braak CJF (1994) Canonical community ordination. Part I: basic theory and linear methods. Ecoscience. 1: 127–140.

Page updated

Google Sites

Report abuse