Tag: scikit-learn

Can´t optimize my hyperparamters using gridsearch. Why does this not work with continous input? Alternatives?

I got a matrix Z (3000*2000), where each row describes a sample. Each column describes a single feature which is a nucleotide (A,G,T,C) and I have standardized the data so that each column contains only 0 and 1. The matrix then looks like this: And y looks like this: I tried this to do a Supportvector machine…

RFE from scikit-learn feature_selection with NegativeBinomial from statsmodels as estimator

python scikit-learn statsmodels

I’m trying to use RFE from scikit-learn with an estimator from statsmodels NegativeBinomial. So I created my own class: But I get this error: Does someone has an idea? Answer You can modify your code to require endog and exog variables, instead of using the formula API:

Sum the predictions of a Linear Regression from Scikit-Learn

numpy python scikit-learn

I need to make a linear regression and sum all the predictions. Maybe this isn’t a question for Scikit-Learn but for NumPy because I get an array at the end and I am unable to turn it into a float. I am getting it right up to this point. The next part (which is a while loop to sum all

How do I make sure GridSearchCV first does the cross split and then the imputing?

data-preprocessing gridsearchcv machine-learning python scikit-learn

I have a GridSearchCV, with a pipeline that looks something like this: my GridSearchCV looks like this: with Cross Validation = 5 So, how do I ensure that I split the data first, and then impute in the most frequent? Answer GridSearchCV will run roughly like this: You can be sure that SimpleImputer and Standa…

How to create a one-hot-encoding for the intermediate class?

encoding pandas python scikit-learn

Let’s say I have 3 classes: 0, 1, 2 One-hot-encoding an array of labels can be done via pandas as follows: What I’m interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes. For example: for class 0.4, resulting encoding should be […

How can I prepare my image dataset for a federated model?

python scikit-learn tensorflow-federated

How could I transform my dataset (composed of images) in a federated dataset? I am trying to create something similar to emnist but for my own dataset. tff.simulation.datasets.emnist.load_data( only_digits=True, cache_dir=None ) Answer You will need to create the clientData object first for example: where cre…

Confuse why my KNN code is throwing a ValueError

knn machine-learning pandas python scikit-learn

I am using sklearn for KNN regressor: I get this error message: Could someone please explain this? My data is in the hundred thousands for target and the thousands for input. And there is no blanks in the data. Answer Before answering the question, Let me refactor the code. You are using a dataframe so you ca…

Scaler fitted in a pipeline turns out to be not fitted yet

python scikit-learn

Please consider this code: I get this message: Why is the scaler not fitted? Answer When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features. To access this fit estimator you can use fit_pipeline = rfe.estimat…

How to properly use Smote in Classification models

data-science jupyter-notebook machine-learning python scikit-learn

I am using smote to balanced the output (y) only for Model train but want to test the model with original data as it makes logic how we can test the model with smote created outputs. Please ask anything for clarification if I didn’t explained it well. It’s my starting on Stack overflow. Here i app…

Random search grid not displaying scoring metric

grid-search python scikit-learn xgboost

I want to do a grid search of some few hyperparameters through a XGBClassifier of a binary class, but whenever i run it the score value (roc_auc) is not being display. I read in other question that this can be related to some error in model training but i am not sure which one is in this case. My model