I have converted my dataset to dataframe. I was wondering how to use it in scikit kmeans or if any other kmeans package available. Answer sklearn is fully compatible with pandas DataFrames. Therefore, it’s as simple as: That 0.6 means you use 60% of your data for training, 40% for testing. More info here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
Tag: scikit-learn
How to get Top 3 or Top N predictions using sklearn’s SGDClassifier
In the above code, clf.predict() prints only 1 best prediction for a sample from list X. I am interested in top 3 predictions for a particular sample in the list X, i know the function predict_proba/predict_log_proba returns a list of all probabilities for each feature in list Y, but it has to sorted and then associated with the features in
Scikit-learn train_test_split with indices
How do I get the original indices of the data when using train_test_split()? What I have is the following But this does not give the indices of the original data. One workaround is to add the indices to data (e.g. data = [(i, d) for i, d in enumerate(data)]) and then pass them inside train_test_split and then expand again. Are
sklearn: how to get coefficients of polynomial features
I know it is possible to obtain the polynomial features as numbers by using: polynomial_features.transform(X). According to the manual, for a degree of two the features are: [1, a, b, a^2, ab, b^2]. But how do I obtain a description of the features for higher orders ? .get_params() does not show any list of features. Answer By the way, there
How does the class_weight parameter in scikit-learn work?
I am having a lot of trouble understanding how the class_weight parameter in scikit-learn’s Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples
RandomForestClassifier import
I’ve installed Anaconda Python distribution with scikit-learn. While importing RandomForestClassifier: from sklearn.ensemble import RandomForestClassifier I have the following error: File “C:Anacondalibsite-packagessklearntreetree.py”, line 36, in <module> from . import _tree ImportError: cannot import name _tree What the problem can be there? Answer The problem was that I had the 64bit version of Anaconda and the 32bit sklearn.
how to check which version of nltk, scikit learn installed?
In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: but it stops shell script at import line in linux terminal tried to see in this manner: which gives nothing thought it is installed. Is there any other way to verify this package installation in shell script,
Ordered Logit in Python?
I’m interested in running an ordered logit regression in python (using pandas, numpy, sklearn, or something that ecosystem). But I cannot find any way to do this. Is my google-skill lacking? Or is this not something that’s been implemented in a standard package? Answer Update: Logit and Probit Ordinal regression models are now built in to statsmodels. https://www.statsmodels.org/devel/examples/notebooks/generated/ordinal_regression.html Examples are
GridSearchCV no reporting on high verbosity
Okay, I’m just going to say starting out that I’m entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom. Summary I’m trying to do type recognition (like digits, for example) with a BernoulliRBM and I’m trying to find the correct parameters with GridSearchCV. However I don’t
Custom transformer for sklearn Pipeline that alters both X and y
I want to create my own transformer for use with the sklearn Pipeline. I am creating a class that implements both fit and transform methods. The purpose of the transformer will be to remove rows from the matrix that have more than a specified number of NaNs. The issue I am facing is how can I change both the X