How to use Scikit kmeans when I have a dataframe

Question

I have converted my dataset to dataframe. I was wondering how to use it in scikit kmeans or if any other kmeans package available. Answer sklearn is fully compatible with pandas DataFrames. Therefore, it's as simple as: That 0.6 means you use 60% of your data for training, 40% for testing. More info here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Accepted Answer

sklearn is fully compatible with pandas DataFrames. Therefore, it&#8217;s as simple as:sample_df_train, sample_df_test = sklearn.cross_validation.train_test_split(sample_df, train_size=0.6)cluster = sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1)cluster.fit(sample_df_train)result = cluster.predict(sample_df_test)That 0.6 means you use 60% of your data for training, 40% for testing.More info here:http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.htmlhttp://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Advertisement

Answer