Search This Blog

Hierarchical Clustering

Hierarchical clustering fall into two types
Agglomerative: This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top-down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Agglomerative Approach:
1) Each data point is taken as one cluster(n clusters are formed)
2) Take 2 closest points(clusters) and make them as one cluster(n-1 clusters formed with this step)
3) Again take 2 closest points(clusters) and make them as one cluster and repeat this step until one cluster remains

Dendrogram:


Vertical axes is the distance between points(clusters) which represents the dissimilarity between them.
Highest Dissimilarity between 2 clusters is the threshold and gives the no: of clusters.
Value around 20 can be the threshold in above dendrogram based on highest dissimilarity. Therefore 3 clusters.

Simpleway of selecting clusters from dendrogram:
Select the highest vertical line that will not be crossed by horizontal line, then draw a horizontal line across that vertical line. Then count the no of vertical crossed by the horizontal line, which gives the no:of clusters. (as shown in below fig)




No comments:

Post a Comment