4.2 Clustering

Although we can probably eyeball the visualized network and identify some prominent groupings, CiteSpace provides more precise ways to identify groupings, or clusters, using the clustering function.

To start the clustering function, simply click on this icon .

How do I know whether the clustering process is completed? You will see #clusters on the upper right corner of the canvas. In the Demo example, a total of 37 clusters of co-cited references are identified. Each cluster corresponds to an underlying theme, a topic, or a line of research.

The signature of the network is shown on the upper left corner of the display. In particular, the modularity Q and the mean silhouette scores are two important metrics that tell us about the overall structural properties of the network. For example, the modularity Q of 0.7141 is relatively high, which means that the network is reasonably divided into loosely coupled clusters. The mean silhouette score of 0.5904 suggests that the homogeneity of these clusters on average is not very high, but not very low either.

You can inspect various measures of each cluster in a summary table of all the clusters using: Clusters ►4. Summarization of Clusters. The Silhouette column shows the homogeneity of a cluster. The higher the silhouette score, the more consistent of the cluster members are, provided the clusters in comparison have similar sizes. If the cluster size is small, then a high homogeneity does not mean much. For example, cluster #9 has 7 members and a silhouette of 1.00, this is most likely due to the possibility that all 7 references are the citation references of the same underlying author. In other words, cluster #9 may reflect the citing behavior or preferences of a single paper, thus it is less representative.

The average year of publication of a cluster indicates whether it is formed by generally recent papers or old papers. This is a simple and useful indicator.

Generate Cluster Labels

To characterize the nature of an identified cluster, CiteSpace can extract noun phrases from the titles (T in the following icon), keyword lists (K), or abstracts (A) of articles that cited the particular cluster.

Let’s ask CiteSpace to choose noun phrases from titles (i.e. select the T icon). This process may take a while as CiteSpace needs to compute several selection metrics. Once the process is finished, the chosen labels will be displayed. By default, labels based on one of the three selection algorithms will be shown, namely, tf*idf. Our study has found that LLR usually gives the best result in terms of the uniqueness and coverage.

Cluster labels are displayed once the process is completed. The clusters are numbered in the descending order of the cluster size, starting from the largest cluster #0, the second largest #1, and so on.

To make it easier to see which clusters are the largest, you can choose to change the font size of the labels from the uniformed to proportional:

Display ►Label Font Size ►Cluster: Uniformed/Proportional

This is a toggle function. That means there are two states. Your selection will switch back and forth between the two states, i.e. either using a uniformed font size or proportional.

Cluster > Circle Packing