The main idea...
The sections below first describe some basic transformations and then discuss transformations specifically geared towards comparing variables. A set of ecologically-motivated transformations intended to allow Euclidean representation of ecological dissimilarities by methods such as PCA and redundancy analysis (RDA) are also summarised.
Before you begin transforming your data, ensure there is a defined and well-supported reason to do so. Common rationale includes linearising, normalising, or standardising data in order to respect a method's assumptions.
Figure 1: Schematics illustrating a linear and square root transformation. a) A linear transformation where the variable "y" is transformed into " y' " through a translation "b" and an expansion "m" transformation. This can be expressed by the linear equation y' = my + b. This transformation may be used to place two or more linearly related variables on the same scale. In this illustration, both "b" and "m" are positive leading to a translation to the right and an expansion, respectively. b) A square root transformation. Larger values of a variable "y" are affected more strongly than smaller values. This transformation is useful when positive data shows a positive skew and a more Gaussian distribution is desired. Hollow circles indicate former positions of values along an axis.
Equation 1: The power transformation expressed as a piecewise function. Resorting to a log transformation when λ = 0 allows the power transformation to remain continuous for all non-negative real numbers.
Equation 2: The Box-Cox transformation. This transformation is used in the Box-Cox procedure to estimate a value of λ which best transforms the variable to meet some criterion such as normality or linearity (see Figure 2 for illustration). The natural logarithm of original values is taken when λ = 0.
Figure 2: Box Cox plots for determining optimal λ values for a) normalising and b) linearising transformations. a) λ is chosen such that it maximises the correlation of a Box-Cox-transfromed variable, X, with a comparable normal distribution, N(μ,σ). In this illustration, a square root transformation (λ = 0.5) appears to be a good choice. b) λ is chosen such that it maximises the correlation between the variable being transformed, X, and another variable, Y. In the illustrated case, squaring the variable (λ = 2) appears to be a good linearising transformation. If the variables X and Y were negatively correlated, the λ corresponding to the minimum (i.e. most negative) correlation would be chosen.
Equation 3: Z-scoring a variable "y".
Warnings
Choose transformations according to need, rather than as a matter of course. Applying transformations that are too "harsh" (i.e. stronger than needed to prepare data for a particular analysis) may distort results and harm interpretation.
If a numerical interpretation of the results is desired, it is necessary to back-transform values after conducting an analysis in order to correctly interpret the results.
Ecological data that has been transformed using an ecologically motivated function can often be interpreted in a straightforward manner, however, transformations which simply aim to correct for some property in the data should be considered carefully during interpretation (Legendre and Legendre, 1998).
Some transformations, such as power transformations, require values to be positive. Adding a constant to achieve this is acceptable.
Treat negative values with caution. Ensure that your transformation adequately represents differences between negative values. If this is not possible, translating values into positive numbers by the addition of a constant scalar quantitiy may be advisable.
Implementations
R
scale() in the base package allows translational and expansion-based scaling.
decostand() in the vegan package contains several transformation functions
boxcox() in the MASS package generates a plot of values of λ against the log-likelihood (derived from a linear model)
References
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26(2):211-252.
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia. 129(2): 271-280
Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.