I need to cluster data using the Fuzzy C-Means
. So, I use fcm
from pyclustering.cluster.fcm
. So, I would like to know if there is a way to get the labels.
import numpy as np import pandas as pd from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.cluster.fcm import fcm import random coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)] dfcluster = pd.DataFrame(coords, columns = ['x','y']) sample = dfcluster.to_numpy() # initialize initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize() # create instance of Fuzzy C-Means algorithm fcm_instance = fcm(sample, initial_centers) # run cluster analysis and obtain results fcm_instance.process() clusters = fcm_instance.get_clusters() print(clusters)
Advertisement
Answer
I have tried it this way, and it works, but I do not think that it is a perfect answer
import pandas as pd from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.cluster.fcm import fcm import random coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)] dfcluster = pd.DataFrame(coords, columns = ['x','y']) sample = dfcluster.to_numpy() # initialize initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize() # create instance of Fuzzy C-Means algorithm fcm_instance = fcm(sample, initial_centers) # run cluster analysis and obtain results fcm_instance.process() clusters = fcm_instance.get_clusters() cluster=0 dfclusternew = pd.DataFrame(columns = ['cluster','x', 'y']) for index, i in enumerate(clusters): for j in i: dfclusternew = dfclusternew.append( pd.Series([cluster, dfcluster['x'].iloc[j], dfcluster['y'].iloc[j]], index=['cluster', 'x', 'y']), ignore_index=True) cluster += 1 dfcluster =dfclusternew print(dfcluster)
However, I think I have another way to do that, and it is faster. As the result is the index in every cluster. So, I used loc[df.index[results[i]]
import pandas as pd from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.cluster.fcm import fcm import random coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)] dfcluster = pd.DataFrame(coords, columns = ['x','y']) dfcluster['cluster'] = 0 sample = dfcluster.to_numpy() # initialize initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize() # create instance of Fuzzy C-Means algorithm fcm_instance = fcm(sample, initial_centers) # run cluster analysis and obtain results fcm_instance.process() dfcluster.reset_index() results=fcm_instance.get_clusters() for i in range(len(results)): dfcluster.loc[dfcluster.index[results[i]], 'cluster'] = i print(dfcluster)