Search This Blog

Clustering K-means

K-means algorithm step by step:

1) Choose number of K clusters
2) Select random k points as centroids
3) Assign each point to closest(based on distance) centroid and recompute the centroid until last point
4) repeat step 3 until centroid doesn't  change.


K-means random initialization Trap:

What would happen if we did bad random initialization?
If we select different centroids we may get different clusters.

how to tackle this?
K-means++ algorithm.

Within Cluster Sum of Squares(WCSS):
p--> point c- centriod


choose low WCSS. that leads to n(total points) clusters. 

Then how to choose number of clusters?
Elbow method:


4 can be chosen as number of clusters as graph(WCSS) drops after 4

Python implementation:
see the python implementation file in git






No comments:

Post a Comment