Head/Tail Breaks

Introduction

A very important aspect of visualization is concerned to classification of data, i.e. merging individual values in a data set into groups or classes. Different classification methods are used for different types of data. One common data type is the so called heavy-tailed distributed data (Figure 1). In recent years a very new classification scheme has been introduced called Head/tail breaks which, in a natural way, classifies the heavy-tailed distributed data. Below are some more interesting details about this very special classification method.
 
Data classification and Head/tail breaks

Data, for instance statistical data related to a map, needs some kind of classification, i.e. each value in the data range needs to be grouped or merged into classes. This method helps to get more understanding of the data, especially if it consists of a large quantity. There are many classification methods or schemes available which are working differently well depending on the data type. Some data sets could have a heavy-tailed distribution which means that there are more low values than higher ones. In a diagram, as showed in Figure 1, this creates a long tail representing the large amount of low values, and the head with the few amount of high values. A classification scheme often used for this kind of data is the classification method called Jenks' natural breaks. However, a new classification method developed and introduced by Jiang (2013) has been revealed in recent years. This new method could with ease be used to classify heavy-tailed distributed data. It has even been proved to give a better result than for instance the Jenks' natural breaks scheme.
 

Figure 1. Diagram showing a typical example of a heavy-tailed data distribution.

What is the essence of this new method? How does it work? As mentioned above, heavy-tailed data distribution has a head with small amounts of high values and a tail with large amounts of low values. The head needs to be divided into breakpoints. A breakpoint is the last or first value in a class. To find the first breakpoint of the heavy tailed data distribution the mean value of that data needs to be found (Figure 1). The selected mean value is then chosen to be the first breakpoint value. Then all values larger than that mean are selected. Out of these values the mean value of those is selected. That will be the second breakpoint. The selection of breakpoints continues in the same manner until there is only one maximum value left. Both the selection of breakpoints and the selection of number of classes are naturally developed. Accordingly, the procedure of the Head/tail break scheme gives way to an automatic or natural selection of both breakpoints and number of classes (Jiang, 2013).
 
Some examples


To show the advantage of this new method a comparison has been made (the two choropleth maps compared in Figure 2 has been remade with permission from Jiang (2013). In Figure 2 (a) there is a map displayed showing the variation of population density for a certain area. The data is heavy-tailed and it has been classified according to Jenks' natural breaks. Now compare this map with the same map in Figure 2 (b). The difference is that the map in Figure 3 has been classified according to Head /tail breaks. The result is significantly different from each other. The map classified with Head/tail breaks clearly has a better result. This strongly shows the power of using Head/tail breaks instead of Jenks' natural breaks when classifying a heavy-tailed distributed data.

 

Figure 2. (a) Choropleth classified according to Jenks' natural breaks, and (b) map classified according to Head/tail breaks.

For more examples using the Head/tail breaks classification scheme, see the attached project article below with the title "Examining the Connectivity Property and Small World Behaviours of Planned and Evolved Urban Street Networks". 

References


Jiang, B. (2013). Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution. The Professional Geographer65(3), 482-494.
Ċ
Isak Hast,
23 Sep 2015, 02:24