This type of clustering involves organizing data into a hierarchical tree structure and clustering it based on similarity.
Hierarchical clustering is divided into two major categories.
Agglomerative (AGNES) - bottom up approach
Divisive (DIANA) - top down approach
In this approach, all the objects are in separate groups and then they are merged onto each other based on similarity until there is only one group or till there is some termination condition.
Use Cases : Document clustering, Gene Clustering
In this approach, all the objects are in same group and then they are divided on the basis of dissimilarity until all the objects are separately in a group or till there is some termination condition.
Use Cases : Organization Clustering, Natural Language Processing
The linkage is basically a measure of the cluster which tells us how the distance is to be calculated for the clustering. There are 4 major linkage methods which are used for hierarchical clustering.. The 4 major linkages are:
Single Linkage
Complete Linkage
Average Linkage
Ward Linkage
Single linkage is the method in which minimum distance between a pair of points of different clusters is calculated and used as a measure. It often results in loose clusters and very elongated dendrograms.
Complete linkage is the method in which maximum distance between a pair of points of different clusters is calculated and used as a measure. It often results in tight clusters and the structure of dendrogram is compact.
Average linkage is the method in which average distance between all the points of the two different clusters is calculated and used as a measure. It often produces balanced clusters and the structure of dendrogram is balanced( not too elongated or compact). Average and Complete Linkage are the most used linkages for Hierarchical clustering in real world.
Ward Linkage in hierarchical clustering minimizes the increase in total within-cluster variance, resulting in compact and well-separated clusters. It often results in more balanced and tight clusters.
Different Types of Linkages
Predefined number of clusters is not required
It is less sensitive to outliers compared to Partitional clustering
It is computationally expensive
Not ideal for large and complex datasets.