Data analysis of charred seeds

Most archaeobotanical data analysis data falls into the category of descriptive statistics, sometimes called data exploration; statistical testing plays a relatively minor role. This descriptive data analysis can range from the construction of simple summary tables or charts to the application of multivariate statistics. A ‘golden rule’ of data analysis is not to combine (in the sense of adding together) different samples or ‘analytical units’ without first establishing that they are sufficiently similar in composition to warrant combination. As such, summary tables, which combine (i.e. add together) all samples from, for example, the same time period, may result in significant loss of information. When considering such amalgamations of archaeobotanical data, it is obvious that a single large sample of grain (which could number many thousands of items) could completely obliterate the composition of numerous smaller samples (of, say, 100 items each) from the same archaeological phase, although each of these samples could represent a past action of equal significance to that represented by the large grain deposit.

Ideally each past action which generated archaeobotanical evidence should be represented separately in an analysis of data, and no action should be represented more than once. These unique events have been termed the ideal ‘unit of analysis’ (Jones 1991), and this unit bears some relation to the archaeobotanical ‘sample’, by which is meant the assemblage of archaeobotanical remains recovered from a single soil sample taken on site (not the soil itself). However, while the archaeobotanical sample, as collected in the field, is a convenient unit for quantification, it is an arbitrary unit depending partly of the level of detail with which the excavation was sampled. For example, a single floor may be sampled in several different places (in which case the results of the same action or event may be sampled more than once) or one large sample taken over most of the floor area (in which case the results of the several actions may be mixed together in the same sample). A cautious approach (as recommended in the section on sampling) is to sample a large floor or other archaeological feature several times (to minimise the possibility of mixing two different events). However, these archaeobotanical samples should then be evaluated to determine the likelihood of their being derived from the same or different actions. As a rule of thumb, if two samples were taken for the same deposit or archaeological context, and are of similar composition, they are likely to derive from the same event (or the same combination of events in the case of mixed deposits) and, as such should be amalgamated for the purpose of data analysis. If, however they are from separate deposits, or are substantially different in composition, they must derive from separate events and should be treated as separate units of analysis.

An advantage of multivariate statistical analysis is that it provides summarised information (usually in the form of a plot) while maintaining the separate identity of each individual sample or unit of analysis, and so allows one to observe differences within, for example, archaeological phases as well as between phases. As such, it combines the best of both worlds by simplifying data but in a manner that is preferable to the ‘blind’ amalgamation of samples into larger units on the basis of arbitrary temporal or spatial information. Spatial, temporal and other archaeological information can nevertheless be mapped onto the analysis, to assist interpretation (see interpretation section). Another myth about multivariate statistical analysis is that it has sometimes been seen as a way of ‘coping’ with large quantities of data, as though the existence of such large datasets were a disadvantage to be overcome. On the contrary large archaeobotanical assemblages are easier to interpret, and provide more reliable interpretations, than small datasets, due to the replication of evidence which results in more robust patterns.

Correspondence analysis of archaeobotanical remains from Mandalo, Greece on the basis of species composition. A) samples B) species (Valamoti and Jones 2003).
Site plan from Mandalo, Greece showing the location and composition of each sample through pie charts (Valamoti and Jones 2003).

The choice of multivariate statistics depends partly on the questions posed but the most commonly used are ordination techniques, such as principal components analysis or correspondence analysis, the later of which is particularly well suited to data in the form of counts (Shennan 1997; Jongman, ter Braak, van Tongeren 1987). The use of correspondence analysis in archaeobotany was pioneered by Lange (1990) and, like principal components analysis, it is a pattern searching technique which extracts axes that account for the greatest variation in the data. Samples are arranged along these axes on the basis of species composition. A plot of the first two axes therefore provides a pattern of the major variation in the data.

Discriminant analysis provides a more ‘goal-oriented’ approach whereby predefined groups of samples are discriminated on the basis of their species composition. Although focusing on the differences between groups, however, the analysis also displays the variation within the groups by plotting every sample in relation to all others and, through statistical testing, can be used to evaluate the integrity of the groups in terms of species composition, and the likelihood of individual samples belonging to each group on the basis of their composition. It also provides information on what distinguishes one group from another. Cluster analysis also groups samples on the basis of compositional similarity (or difference) but provides less information on the relationship of one sample to another and, unlike discriminant analysis, provides no information on the differences between groups.

Examples of interpretations based on both correspondence analysis and discriminant analysis are presented in the interpretation section.

A simply but effective way of displaying spatial variation of sample compositions is to plot each sample on site plans or sections in the form of pie or bar charts.


  • Jones, G. 1991. Numerical analysis in archaeobotany, pp. 63-80 in W. van Zeist, K. Wasylikowa and K.-E. Behre (eds.) Progress in Old World Palaeoethnobotany. Rotterdam: A.A. Balkema.
  • Jongmen, R.H.G, ter Braak, C. J.F., and van Tongeren, O. F. R. (1987) Data analysis in community and landscape ecology. Wageningen: Pudoc
  • Lange, A.G. 1990. De Horden near Wijk bij Duurstede. Plant remains from a native settlement at the Roman frontier: a numerical approach. Nederlandse Oudheden 13, ROB, Amersfoort
  • Shennan, S. 1997. Quantifying Archaeology. Edinburgh, Edinburgh University Press.
  • Valamoti and Jones 2003. Plant diversity and storage at Mandalo, Macedonia, Greece: archaeobotanical evidence from the Final Neolithic and Early Bronze Age. Annual of the British School at Athens 98: 1-35