Skip to content
Advertisement

Tag: scikit-learn

How to use Scikit kmeans when I have a dataframe

I have converted my dataset to dataframe. I was wondering how to use it in scikit kmeans or if any other kmeans package available. Answer sklearn is fully compatible with pandas DataFrames. Therefore, it’s as simple as: That 0.6 means you use 60% of your data for training, 40% for testing. More info here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Scikit-learn train_test_split with indices

How do I get the original indices of the data when using train_test_split()? What I have is the following But this does not give the indices of the original data. One workaround is to add the indices to data (e.g. data = [(i, d) for i, d in enumerate(data)]) and then pass them inside train_test_split and then expand again. Are

sklearn: how to get coefficients of polynomial features

I know it is possible to obtain the polynomial features as numbers by using: polynomial_features.transform(X). According to the manual, for a degree of two the features are: [1, a, b, a^2, ab, b^2]. But how do I obtain a description of the features for higher orders ? .get_params() does not show any list of features. Answer By the way, there

How does the class_weight parameter in scikit-learn work?

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn’s Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 19:1 with the majority of samples

RandomForestClassifier import

I’ve installed Anaconda Python distribution with scikit-learn. While importing RandomForestClassifier: from sklearn.ensemble import RandomForestClassifier I have the following error: File “C:Anacondalibsite-packagessklearntreetree.py”, line 36, in <module> from . import _tree ImportError: cannot import name _tree What the problem can be there? Answer The problem was that I had the 64bit version of Anaconda and the 32bit sklearn.

Ordered Logit in Python?

I’m interested in running an ordered logit regression in python (using pandas, numpy, sklearn, or something that ecosystem). But I cannot find any way to do this. Is my google-skill lacking? Or is this not something that’s been implemented in a standard package? Answer Update: Logit and Probit Ordinal regression models are now built in to statsmodels. https://www.statsmodels.org/devel/examples/notebooks/generated/ordinal_regression.html Examples are

GridSearchCV no reporting on high verbosity

Okay, I’m just going to say starting out that I’m entirely new to SciKit-Learn and data science. But here is the issue and my current research on the problem. Code at the bottom. Summary I’m trying to do type recognition (like digits, for example) with a BernoulliRBM and I’m trying to find the correct parameters with GridSearchCV. However I don’t

Advertisement