Ranked data

Ranking data sets is useful when statements on the order of observations are more important than the magnitude of their differences and little is known about the underlying distribution of the data. Many nonparametric statistics - which make no distributional assumptions - are applied to ranked data. Naturally, information is sacrificed by resorting to ranks, thus, be sure that there is good reason to rank raw data.

Conversion of raw data matrices to ranked data (Figure 1) generally proceeds by setting the lowest value to rank "1". If there are 'ties' (i.e. objects with the same values and thus the same rank), ranks may be assigned based on the minimum, maximum, or the average of the tied ranks. In Figure 1, the value "0" occupies ranks 1-6. In panel b, the average rank was computed: (1+2+3+4+5+6) ÷ 6 = 3.5

When ranking (dis)similarity or distance matrices (Figure 2), the least dissimilar pair of sites is typically given a rank of "1". Ranking proceeds as above. As (dis)similarity matrices are symmetrical about their diagonals, only one side of the diagonal need be evaluated.

Figure 1: a) Raw data converted to b) ranked data using averaging to handle tied ranks.

;

Figure 2: a) Dissimilarity matrix converted to a b) ranked dissimilarity matrix using averaging to handle tied ranks.

Page updated

Google Sites

Report abuse