I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans.
However, I observed that DBSCAN has something called core points
. I am thinking if it is possible to use these core points or any other alternative to obtain a representative point from each cluster.
I have mentioned below the code that I have used.
import numpy as np from math import pi from sklearn.cluster import DBSCAN #points containing time value in minutes points = [100, 200, 600, 659, 700] def convert_to_radian(x): return((x / (24 * 60)) * 2 * pi) rad_function = np.vectorize(convert_to_radian) points_rad = rad_function(points) #generate distance matrix from each point dist = points_rad[None,:] - points_rad[:, None] #Assign shortest distances from each point dist[((dist > pi) & (dist <= (2*pi)))] = dist[((dist > pi) & (dist <= (2*pi)))] -(2*pi) dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] = dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] + (2*pi) dist = abs(dist) #check dist print(dist) #using default values, set metric to 'precomputed' db = DBSCAN(eps=((100 / (24*60)) * 2 * pi ), min_samples = 2, metric='precomputed') #check db print(db) db.fit(dist) #get labels labels = db.labels_ #get number of clusters no_clusters = len(set(labels)) - (1 if -1 in labels else 0) print('No of clusters:', no_clusters) print('Cluster 0 : ', np.nonzero(labels == 0)[0]) print('Cluster 1 : ', np.nonzero(labels == 1)[0]) print(db.core_sample_indices_)
I am happy to provide more details if needed.
Advertisement
Answer
Why don’t you estimate the centroids of the resulted estimated clusters?
points_of_cluster_0 = dist[labels==0,:] centroid_of_cluster_0 = np.mean(points_of_cluster_0, axis=0) print(centroid_of_cluster_0) points_of_cluster_1 = dist[labels==1,:] centroid_of_cluster_1 = np.mean(points_of_cluster_1, axis=0) print(centroid_of_cluster_1)