Lets say I have the given dataframe And I would like to find clusters in these rows. To do so, I want to use Kmeans. However, I would like to find clusters by giving more importance to [feature_1, feature_2] than to the other features in the dataframe. Lets say an importance coefficient of 0.5 for [feature_1, feature_2] , and 0.5
Tag: k-means
How to print KMeans intiatial parameters?
I am using PyCharm to run Kmeans using Iris data. When I run this, simply prints KMeans() But I would like it to print the following: How can this be accomplished? Answer Simply run kmeans.get_params(). This will print out the parameters (default or custom) used while instantiating the function in a dictionary format. Please refer this link for more information.
KMeans clustering from all possible combinations of 2 columns not producing correct output
I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns. However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examples of the output. Only cluster_1
Python: Convert a pandas Series into an array and keep the index
I’m running a k-means algorithm (k=5) to cluster my Data. To check the stability of my algorithm, I first run the algorithm once on my whole dataset and afterwards I run the algorithm multiple times on 2/3 of my dataset (using a different random states for the splits). I use the results to predict the cluster of the remaining 1/3
Clustering images using unsupervised Machine Learning
I have a database of images that contains identity cards, bills and passports. I want to classify these images into different groups (i.e identity cards, bills and passports). As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised). The idea for me is like this: the clustering will
Kmean clustering top terms in cluster
I am using python Kmean clustering algorithm for cluster document. I have created a term-document matrix Then I applied Kmean clustering using following code My next task is to see the top terms in every cluster, searching on googole suggested that many of the people has used the km.cluster_centers_.argsort()[:, ::-1] for finding the top term in the clusters using the
How to use Scikit kmeans when I have a dataframe
I have converted my dataset to dataframe. I was wondering how to use it in scikit kmeans or if any other kmeans package available. Answer sklearn is fully compatible with pandas DataFrames. Therefore, it’s as simple as: That 0.6 means you use 60% of your data for training, 40% for testing. More info here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html