Skip to content
Advertisement

Tag: scikit-learn

Difference between Standard scaler and MinMaxScaler

What is the difference between MinMaxScaler() and StandardScaler(). mms = MinMaxScaler(feature_range = (0, 1)) (Used in a machine learning model) sc = StandardScaler() (In another machine learning model they used standard-scaler and not min-max-scaler) Answer From ScikitLearn site: StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean

Interpreting logistic regression feature coefficient values in sklearn

I have fit a logistic regression model to my data. Imagine, I have four features: 1) which condition the participant received, 2) whether the participant had any prior knowledge/background about the phenomenon tested (binary response in post-experimental questionnaire), 3) time spent on the experimental task, and 4) participant age. I am trying to predict whether participants ultimately chose option A

GridSearchCV.best_score not same as cross_val_score(GridSearchCV.best_estimator_)

Consider the following gridsearch : grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5) grid_fit = grid.fit(X_train1, y_train1) According to Sklearn’s ressource, grid_fit.best_score_ returns The mean cross-validated score of the best_estimator . To me that would mean that the average of : cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5) should be exactly the same as: grid_fit.best_score_. However I am getting a 10% difference

Scaling / Normalizing pandas column

I have a dataframe like: I’d like to create a newly scaled column in the dataframe called SIZE where SIZE is a number between 5 and 50. For Example: I’ve tried but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. I’ve tried other things,

Python SKLearn: How to Get Feature Names After OneHotEncoder?

I would like to get the feature names of a data set after it has been transformed by SKLearn OneHotEncoder. In active_features_ attribute in OneHotEncoder one can see a very good explanation how the attributes n_values_, feature_indices_ and active_features_ get filled after transform() was executed. My question is: For e.g. DataFrame based input data: How does the code look like

LightGBMError “Check failed: num_data > 0” with Sklearn RandomizedSearchCV

I’m trying LightGBMRegressor parameter tuning with Sklearn RandomizedSearchCV. I got an error with message below. error: I cannot tell why and the specific parameters caused this error. Any of params_dist below was not suitable for train_x.shape:(1630, 1565)? Please tell me any hints or solutions. Thank you. LightGBM version: ‘2.0.12’ function caused this error: Too long to put full stack trace,

pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV

I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using GridSearchCV(). I am trying to create a pipeline to do these in order but failed so far. It throws invalid parameters. Answer The parameters

Kmean clustering top terms in cluster

I am using python Kmean clustering algorithm for cluster document. I have created a term-document matrix Then I applied Kmean clustering using following code My next task is to see the top terms in every cluster, searching on googole suggested that many of the people has used the km.cluster_centers_.argsort()[:, ::-1] for finding the top term in the clusters using the

Advertisement