Applied Multivariate Statistics for Ecologists (FSH 560)

Instructor: Julian D. Olden

Office Location: Fisheries Science Blg., Room 315A

Office hours: Thursday 11:30 am - 12:30 pm

Contact information:

Class hours: Tuesday and Thursday 9:30 - 11:20 am

Class location: Fisheries Science Building Room 136

Prerequisite(s): QSCI 482 or equivalent or permission from instructor

Next offered in Fall 2014

Multivariate statistics describes the collection of procedures involving the observation
and analysis of two or more dependent variables.


 "This class introduced multivariate methods and forced me to think about how each method could be applied to my research. It stretched my intellect and I now consider multivariate statistics as a tool that I'm comfortable to use."  

"Everything we learned in class was immediately applied to the class data set or our own data to give hands-on experience."
 "The use of our data was key in making the class an individualized success. It challenged my understanding of techniques and also assumptions in my own data and will contribute to my grad school progress like no other class has."  

"I would highly recommend this course to any and every ecologist, and I would lobby anyone to take it from Julian"


With recent advances in data collection technology and ambitious field research, ecologists are increasingly calling upon multivariate statistics to explore and test for patterns in their data.  The goal of this course is to introduce graduate students in the ecological sciences to the multivariate statistical techniques necessary to carry out sophisticated analyses and to critically evaluate scientific papers using these approaches.  This is a practical and hands-on course emphasizing the analysis and interpretation of multivariate analysis, and covers the majority of approaches in common use by ecologists.  The emphasis of the course is on the conceptual understanding and practical use of the methods (not the matrix algebra), with the singular hope of de-mystifying the "alphabet soup" of multivariate analysis. 

We will cover the three main categories of multivariate analysis that are common in ecology: (i) clustering, (ii) ordination and (iii) statistical tests of hypotheses.  The intent of this course is to provide you with the following: (1) an introduction to the use of multivariate statistics in ecological research; (2) a conceptual organization of the various multivariate techniques, with respect to the types of research questions and data sets appropriate for each technique; and (3) a working understanding of how to use and interpret the results of each technique, including a conceptual overview, list of assumptions, diagnostics for assessing the assumptions, mechanics of performing the analysis using the R package, and how to interpret the statistical output of the analysis. 


Lectures/labs: Lectures will integrate both theoretical aspects of multivariate statistics and provide solutions and interpretations from the R package.  For each topic there will be a formal lecture followed by a computer-based lab where R will be used to analyze ecological data using the particular multivariate technique.  This course will also point to other available software packages, including PC-ORD and Primer.

Pop-quiz: A portion of your grade is based on a pop-quiz that will be administered at some point during the quarter.  This quiz is used to test your understanding of the material, and promote self-evaluation of your progress in the course.

Final report and peer review: A significant portion of your grade is based on a final written paper and peer review of other class members’ papers.  The final paper will consist of a statistical analysis of a multivariate data set (approved by your instructor).  The nature of the question, the source of the data, and the kinds of analysis employed is flexible.  The primary requirement is that the data and analysis must address one or more specific biological hypotheses, which are to be tested using an appropriate method(s) of multivariate analysis.  The primary goal is a coherent scientific paper, not excessive number crunching.                                         


Personal dataset: A primary goal of this course is to provide you the opportunity to get better acquainted with your own data.  The data set may be your own, one obtained from the literature or one provided by the Instructor.  Ideally you should use data that you have collected or are otherwise somewhat familiar with. The data set should be one or more matrices of entities × attributes (e.g., samples × species, species × characteristics of species, sites × environmental factors, etc.).  The only data requirements are that it be adequate to test the hypotheses addressed in your final report. If you do not have access to a multivariate dataset, then I would be please to provide one.

Class dataset: Even if you do have a multivariate dataset, it is unlikely to be suitable for all the techniques covered in class.  To address this issue I will provide a common dataset to all students at the beginning of the quarter.  This dataset is in addition to your own personal dataset that forms the basis for your final report.  Using the class data you will be able to conduct all the statistical approaches covered in the class. Moreover, this dataset will serve as the basis for the short assignments conducted in R.  You will be expected to work with both your own dataset and the class dataset during the labs.


There is no required text for this course, however I highly recommend:

McGarigal, K., S. Cushman, and S. Stafford. 2000. Multivariate Statistics for Wildlife and Ecology Research. Springer.

Other statistical texts that are likely to be helpful (in order of value based on my personal experience) include:

Legendre, P., and L. Legendre. 1998. Numerical Ecology. 2nd edn. Elsevier Scientific.

Gauch, H.G. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press.

Manly, B.F.J. 2004. Multivariate Statistical Methods: a primer. Chapman and Hall.

Digby, P.G.N. and R.A. Kempton. 1987. Multivariate Analysis of Ecological Communities. Chapman & Hall.

Jongman, R.H.G., C.J.F. ter Braak, and O.F.R. van Tongeren. 1995. Data analysis in Community and Landscape Ecology. Cambridge University Press.

Pielou, E.C. 1984. The interpretation of ecological data: a primer on classification and ordination. Wiley-Interscience.


Course overview – The beast we call “multivariate statistics”

Data screening

Multivariate resemblance

  • Modes of analysis, analytical spaces
  • Similarity coefficients (binary, categorical, quantitative)
  • Distance coefficients
  • Coefficients of dependence,
  • Choice of coefficients

Cluster analysis

  • Hierarchical agglomerative clustering (e.g., linkage, UPGMA)
  • Hierarchical divisive clustering (e.g., TWINSPAN, K-means)
  • Cluster diagnostics, limitations, and recommendations
  • Presenting results from cluster analyses: The dos and don’ts!

Direct Ordination

  • Principal component analysis (PCA)
  • Computing eigenvalues, principal components
  • Covariance vs. correlation, meaningful components, misuses
  • Principal coordinate analysis (PCoA)
  • Non-metric multidimensional scaling (NMDS)
  • Correspondence analysis (CA)
  • Detrended correspondence analysis (DCA)

Indirect Ordination

  • Redundancy analysis (RDA)
  • Canonical correspondence analysis (CCA)
  • Canonical correlation analysis (CCorA)
  • Partial RDA and CCA
  • Hierarchical RDA and CCA

Classification of Groups

  • Discriminant Function Analysis
  • Classification and Regression Trees

Testing for Similarities and Differences among Groups

  • Analysis of similarity (ANOSIM)
  • Multi-response Permutation Procedure (MRPP)

Testing for Differences among Groups

  • Permutational MANOVA (perMANOVA)
  • Permutation test of multivariate dispersion

Testing for Associations among Matrices

  • Mantel Test
  • Procrustes Analysis