Skip to content
Advertisement

How to get the cluster label when aplying Fuzzy C-Means clustering pyclustering?

I need to cluster data using the Fuzzy C-Means. So, I use fcm from pyclustering.cluster.fcm. So, I would like to know if there is a way to get the labels.

import numpy as np
import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random

coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
clusters = fcm_instance.get_clusters()

print(clusters)

Advertisement

Answer

I have tried it this way, and it works, but I do not think that it is a perfect answer

import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random

coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
clusters = fcm_instance.get_clusters()

cluster=0
dfclusternew = pd.DataFrame(columns = ['cluster','x', 'y'])
for index, i in enumerate(clusters):
    for j in i:
        dfclusternew = dfclusternew.append(
            pd.Series([cluster, dfcluster['x'].iloc[j], dfcluster['y'].iloc[j]], index=['cluster', 'x', 'y']),
            ignore_index=True)
    cluster += 1
dfcluster =dfclusternew
print(dfcluster)

However, I think I have another way to do that, and it is faster. As the result is the index in every cluster. So, I used loc[df.index[results[i]]

import pandas as pd
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.fcm import fcm
import random

coords = [(random.random()*2.0, random.random()*2.0) for _ in range(100)]
dfcluster = pd.DataFrame(coords, columns = ['x','y'])
dfcluster['cluster'] = 0
sample = dfcluster.to_numpy()
# initialize
initial_centers = kmeans_plusplus_initializer(sample, 5, kmeans_plusplus_initializer.FARTHEST_CENTER_CANDIDATE).initialize()
# create instance of Fuzzy C-Means algorithm
fcm_instance = fcm(sample, initial_centers)
# run cluster analysis and obtain results
fcm_instance.process()
dfcluster.reset_index()
results=fcm_instance.get_clusters()
for i in range(len(results)):
    dfcluster.loc[dfcluster.index[results[i]], 'cluster'] = i
print(dfcluster)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement