Tag: scikit-learn

How to use Scikit kmeans when I have a dataframe

I have converted my dataset to dataframe. I was wondering how to use it in scikit kmeans or if any other kmeans package available. Answer sklearn is fully compatible with pandas DataFrames. Therefore, it’s as simple as: That 0.6 means you use 60% of your data for training, 40% for testing. More info here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

How to get Top 3 or Top N predictions using sklearn’s SGDClassifier

multilabel-classification python scikit-learn

In the above code, clf.predict() prints only 1 best prediction for a sample from list X. I am interested in top 3 predictions for a particular sample in the list X, i know the function predict_proba/predict_log_proba returns a list of all probabilities for each feature in list Y, but it has to sorted and then associated with the features in

Scikit-learn train_test_split with indices

classification python scikit-learn scipy

How do I get the original indices of the data when using train_test_split()? What I have is the following But this does not give the indices of the original data. One workaround is to add the indices to data (e.g. data = [(i, d) for i, d in enumerate(data)]) and then pass them inside train_test_split and then expand again. Are

sklearn: how to get coefficients of polynomial features

python scikit-learn

I know it is possible to obtain the polynomial features as numbers by using: polynomial_features.transform(X). According to the manual, for a degree of two the features are: [1, a, b, a^2, ab, b^2]. But how do I obtain a description of the features for higher orders ? .get_params() does not show any list of features. Answer By the way, there

How does the class_weight parameter in scikit-learn work?

python scikit-learn

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn’s Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples

RandomForestClassifier import

python random-forest scikit-learn

I’ve installed Anaconda Python distribution with scikit-learn. While importing RandomForestClassifier: from sklearn.ensemble import RandomForestClassifier I have the following error: File “C:Anacondalibsite-packagessklearntreetree.py”, line 36, in <module> from . import _tree ImportError: cannot import name _tree What the problem can be there? Answer The problem was that I had the 64bit version of Anaconda and the 32bit sklearn.

how to check which version of nltk, scikit learn installed?

linux nltk python scikit-learn shell

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: but it stops shell script at import line in linux terminal tried to see in this manner: which gives nothing thought it is installed. Is there any other way to verify this package installation in shell script,

Ordered Logit in Python?

machine-learning numpy pandas python scikit-learn

I’m interested in running an ordered logit regression in python (using pandas, numpy, sklearn, or something that ecosystem). But I cannot find any way to do this. Is my google-skill lacking? Or is this not something that’s been implemented in a standard package? Answer Update: Logit and Probit Ordinal regression models are now built in to statsmodels. https://www.statsmodels.org/devel/examples/notebooks/generated/ordinal_regression.html Examples are

GridSearchCV no reporting on high verbosity

machine-learning python scikit-learn

Okay, I’m just going to say starting out that I’m entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom. Summary I’m trying to do type recognition (like digits, for example) with a BernoulliRBM and I’m trying to find the correct parameters with GridSearchCV. However I don’t

Custom transformer for sklearn Pipeline that alters both X and y

machine-learning numpy pandas python scikit-learn

I want to create my own transformer for use with the sklearn Pipeline. I am creating a class that implements both fit and transform methods. The purpose of the transformer will be to remove rows from the matrix that have more than a specified number of NaNs. The issue I am facing is how can I change both the X