sklearn Clustering: Fastest way to determine optimal number of cluster on large data sets

Question

I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster method to determine the optimal number of cluster? Or should I change the clustering algorithm? If yes, which is the best (and fastest) algorithm for a data set with

Accepted Answer

Most common method to find number of cluster is elbow curve method. But it will require you to run KMeans algorithm multiple times to plot graph. https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set wiki page mentions some common methods to determine number of clusters.

Advertisement

Answer