I got a matrix Z (3000*2000), where each row describes a sample. Each column describes a single feature which is a nucleotide (A,G,T,C) and I have standardized the data so that each column contains only 0 and 1. The matrix then looks like this: And y looks like this: I tried this to do a Supportvector machine And now i
Tag: scikit-learn
RFE from scikit-learn feature_selection with NegativeBinomial from statsmodels as estimator
I’m trying to use RFE from scikit-learn with an estimator from statsmodels NegativeBinomial. So I created my own class: But I get this error: Does someone has an idea? Answer You can modify your code to require endog and exog variables, instead of using the formula API:
Sum the predictions of a Linear Regression from Scikit-Learn
I need to make a linear regression and sum all the predictions. Maybe this isn’t a question for Scikit-Learn but for NumPy because I get an array at the end and I am unable to turn it into a float. I am getting it right up to this point. The next part (which is a while loop to sum all
How do I make sure GridSearchCV first does the cross split and then the imputing?
I have a GridSearchCV, with a pipeline that looks something like this: my GridSearchCV looks like this: with Cross Validation = 5 So, how do I ensure that I split the data first, and then impute in the most frequent? Answer GridSearchCV will run roughly like this: You can be sure that SimpleImputer and StandardScaler will do .fit() and .transform()
How to create a one-hot-encoding for the intermediate class?
Let’s say I have 3 classes: 0, 1, 2 One-hot-encoding an array of labels can be done via pandas as follows: What I’m interested in, is how to get an encoding that can handle an intermediate class, e.g. class in the middle between 2 classes. For example: for class 0.4, resulting encoding should be [0.4, 0.6, 0] for class 1.8,
How can I prepare my image dataset for a federated model?
How could I transform my dataset (composed of images) in a federated dataset? I am trying to create something similar to emnist but for my own dataset. tff.simulation.datasets.emnist.load_data( only_digits=True, cache_dir=None ) Answer You will need to create the clientData object first for example: where create_dataset is a serializable function but first you have to prepare your images read this tutorial
Confuse why my KNN code is throwing a ValueError
I am using sklearn for KNN regressor: I get this error message: Could someone please explain this? My data is in the hundred thousands for target and the thousands for input. And there is no blanks in the data. Answer Before answering the question, Let me refactor the code. You are using a dataframe so you can index single or
Scaler fitted in a pipeline turns out to be not fitted yet
Please consider this code: I get this message: Why is the scaler not fitted? Answer When passing a pipeline or an estimator to RFE, it essentially gets cloned by the RFE and fit until it finds the best fit with the reduced number of features. To access this fit estimator you can use fit_pipeline = rfe.estimator_ But note, this new
How to properly use Smote in Classification models
I am using smote to balanced the output (y) only for Model train but want to test the model with original data as it makes logic how we can test the model with smote created outputs. Please ask anything for clarification if I didn’t explained it well. It’s my starting on Stack overflow. Here i applied the Random Forest Classifier
Random search grid not displaying scoring metric
I want to do a grid search of some few hyperparameters through a XGBClassifier of a binary class, but whenever i run it the score value (roc_auc) is not being display. I read in other question that this can be related to some error in model training but i am not sure which one is in this case. My model