## PHYTOLITHS : DATA ANALYSIS

## The sample table

The first step in ordering the data collected during a phytolith analysis is the sample table. This table should resume all the important information related to the samples before and after processing. It should give the reader a first, general information about the samples and the phytoliths extracted. Of interest to list in this kind of table are the provenance of the samples, the context, the amount of sediment processed as well as the amount of acid insoluble fraction (AIF), the number of phytoliths counted in each sample and the estimate number of phytoliths per gram of AIF (or any other standard amount of sediment), etc. It is important to present all the available background information about the samples processed for phytoliths to help the interpretation of the analysis results. Here below is an example of sample table from the Palaeolithic site of Amud (Madella et al 2002).

## The phytolith count table

The types and/or taxonomic or anatomical groups of phytoliths identified and counted at the microscope need to be summarised in a table that is easy to read (for both the specialist and the layperson). The phytolith count table must be always presented and it should list all the phytolith types encountered in the analysed samples expressed as absolute and relative (%) amounts. It is very important to calculate the relative proportions (%) of the various types to compare the samples. Refrain to make comparison between absolute numbers of phytoliths because if the total number of phytoliths counted in each sample is different the quantities are not comparable resulting therefore in artificial patterns. It is sometime useful for clarity to group the different phytolith typologies according to their taxonomical/vegetation meaning (e.g. C3 grass short cell, grass cells, dicotyledon phytoliths, etc.) or according to their anatomical significance.

### Examples of taxonomical/vegetation significance

- Arecaceae (plants of the Palm family)
- Poaceae (herbaceous plants of the grasses family)
*Oryza sativa*(identification at species level)*Triticum*sp (identification at genus level, no further information about the species)- cf.
*Triticum*sp (possible identification at genus level but not conclusive) - Grass short cells (C3) (herbaceous plants of the Poaceae family - C3 photosynthesis pathway)
- Grass short cells (C4) (herbaceous plants of the Poaceae family - C4 photosynthesis pathway)
- Dicotyledons (dicotyledonous plants, can include both woody and herbaceous species)
- Woody dicotyledons (only ligneous dicotyledonous plants, both shrubs and trees)

### Examples of anatomical significance

- Grass inflorescence (long cells from glume, palea and lemma of general wild or not further determined grasses)
- Cereal inflorescence or Chaff (long cells from glume, palea and lemma of cereal grasses; e.g. dendritics)
- Grass culm/leaves (long cells and other cells from the stem and the leaf of grasses; e.g. elongate psilate long cells, bulliforms, etc.)
- Wood (phytoliths originating from the ligneous parts of a plant)

## The phytolith diagrams

After the phytoliths have been identified and counted, the data collected is presented in a phytolith diagram. A phytolith diagram can summarise the assemblages of phytoliths in different ways. A simple, very general diagram that has an interest when we want to draw attention to specific components of the assemblages looks like the one presented here below with data from 3 Indus Civilization sites (presence-absence). The point of interest in this case is the presence or absence of certain taxa in each site and in each cultural horizon. Such a general diagram must always be supported by a phytolith table with all the raw data.

Another useful way to summarise the phytolith data is the use of bar charts. These diagrams allow easy comparison between groups of phytoliths (e.g. taxa, plant parts, etc.) against the contexts or the phases (ages) of a site. However, you should be careful not to have too many groups because it would be difficult to differentiate and read bars with many (often too small) segments.

When the focus for presenting the data is on the diachronic variability of phytolith assemblages, it is possible to use a stratigraphic bar chart/curve diagram. The diagram consists in this case of an horizontal line that represent the phytolith assemblages and the type frequencies, and a vertical line that represent the time (age). The observed fluctuation in the assemblages over time are assumed to be related to change in plant use (or vegetation composition).

## Statistical techniques in phytolith analysis

### Cluster analysis

Cluster analysis or clustering is the assignment of objects into groups (which are called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. Often similarity is assessed according to a distance measure (e.g. Euclidean, Manhattan, etc.). This analysis is not often used in phytolith studies because implies similarities between the assemblages, pushing them to group. However, such a data analysis might be useful when we already know that there are possible groups (e.g. taxonomic) and the objective is to test that the phytolith assemblages of a taxon or several taxa have indeed a taxonomic significance. An example of this approach is the study of the phytolith assemblages from some plant species from the European Alps to determine the possibility to detect broad taxonomic groups using the phytoliths produced in the plants tissues (Carnelli et al 2004).

*Alnus viridis*) form another group. Modified from Carnelli et al 2004.

### Correspondence analysis (CA) and Principal component analysis (PCA)

The interpretation and quantification of large data sets can cause serious problems to the analyst. In the whole of the data (normally numbers) it is often difficult to spot “patterns” and evaluate their significance. Statistical analyses such CA or PCA are helpful in finding these patterns (provided that they exist) and locating groups and/or consistent trends.

### Correspondence analysis (CA)

The statistical technique employed is a descriptive analysis and will investigate the structure within a collection of objects (e.g., samples) to find if they naturally fall into groups in which samples are “similar” between them and distinct from those of the other groups. Eventually, it may be possible to investigate the patterns we suspect a priori of existing in our data (see above Clustering analysis), but it is much better to have an exploratory framework where the patterns reveal themselves. Correspondence Analysis is a statistical technique for producing graphical displays of frequency data expressed in the form of contingency tables. It is a multivariate scaling technique, a method of classification that seeks to encapsulate the available information of n+p points (where n are objects and p variables) in a low dimensional space. This technique looks for the lines and planes of closest fit to these two clouds of points (one representing the samples and one the morphotypes).

Correspondence analysis is similar to Principal Components Analysis and to Factor Analysis but it produces and analyses coordinate values for rows and columns because both are examined as variables (contingency tables). In a contingency table the cells values are considered as individuals and they are cross-identified by these variables.

In the case of seeking the line of closest fit, the quantity is called the moment of inertia of that points along the line. The inertia represents the total amount of information available in the data set, the higher the inertia the greater the information. The line where the moment of inertia is maximised is then called the first principal axis of inertia (Greenacre 1984, Powers et al. 1993). The total inertia of a cloud of points is the sum of the moments of inertia along the principal axis and can be used as a measure of the total variation within the data matrix. The contribution of any single point to the total inertia of the axis can be determined by the individual moment of inertia.

### Principal component analysis (PCA)

Principal component analysis is mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalue decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and loadings (Shaw, 2003).

## References

- Carnelli, A. L., Theurillat, J.-P. and Madella, M, 2004. Phytolith types and type-frequencies in subalpine–alpine plant species of the European Alps. Review of Palaeobotany and Palynology 129:39-65.
- Greenacre, M.J. 1984. Theory and applications of Correspondence Analysis, Academic Press, London.
- Madella, M., Jones, M.K., Goldberg, P., Goren, Y. and Hovers, E. 2002. The Exploitation of Plant Resources by Neanderthals in Amud Cave (Israel): The Evidence from Phytolith Studies. Journal of Archaeological Science 29:703–719.
- Powers-Jones, A.H. and Padmore, J. 1993. The use of quantitative methods and statistical analyses in the study of opal phytoliths, In D.M. Pearsall and D.R. Piperno (eds.) Current research in phytolith analysis: applications in Archaeology and Palaeoecology, MASCA Research Papers in Science and Archaeology, 10, University of Pennsylvania, Philadelphia, 47-56.
- Shaw, P. J. A. 2003. Multivariate statistics for the Environmental Sciences. Hodder-Arnold.
- Strömberg, C. A. E. 2002. The origin and spread of grass-dominated ecosystems in the late Tertiary of North America: preliminary results concerning the evolution of hypsodonty. Palaeogeography, Palaeoclimatology, Palaeoecology 177:59-75.