I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans.
However, I observed that DBSCAN has something called core points
. I am thinking if it is possible to use these core points or any other alternative to obtain a representative point from each cluster.
I have mentioned below the code that I have used.
JavaScript
x
44
44
1
import numpy as np
2
from math import pi
3
from sklearn.cluster import DBSCAN
4
5
#points containing time value in minutes
6
points = [100, 200, 600, 659, 700]
7
8
def convert_to_radian(x):
9
return((x / (24 * 60)) * 2 * pi)
10
11
rad_function = np.vectorize(convert_to_radian)
12
points_rad = rad_function(points)
13
14
#generate distance matrix from each point
15
dist = points_rad[None,:] - points_rad[:, None]
16
17
#Assign shortest distances from each point
18
dist[((dist > pi) & (dist <= (2*pi)))] = dist[((dist > pi) & (dist <= (2*pi)))] -(2*pi)
19
dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] = dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] + (2*pi)
20
dist = abs(dist)
21
22
#check dist
23
print(dist)
24
25
#using default values, set metric to 'precomputed'
26
db = DBSCAN(eps=((100 / (24*60)) * 2 * pi ), min_samples = 2, metric='precomputed')
27
28
#check db
29
print(db)
30
31
db.fit(dist)
32
33
#get labels
34
labels = db.labels_
35
36
#get number of clusters
37
no_clusters = len(set(labels)) - (1 if -1 in labels else 0)
38
39
print('No of clusters:', no_clusters)
40
print('Cluster 0 : ', np.nonzero(labels == 0)[0])
41
print('Cluster 1 : ', np.nonzero(labels == 1)[0])
42
43
print(db.core_sample_indices_)
44
I am happy to provide more details if needed.
Advertisement
Answer
Why don’t you estimate the centroids of the resulted estimated clusters?
JavaScript
1
8
1
points_of_cluster_0 = dist[labels==0,:]
2
centroid_of_cluster_0 = np.mean(points_of_cluster_0, axis=0)
3
print(centroid_of_cluster_0)
4
5
points_of_cluster_1 = dist[labels==1,:]
6
centroid_of_cluster_1 = np.mean(points_of_cluster_1, axis=0)
7
print(centroid_of_cluster_1)
8