i am trying to create a pipeline that first impute missing data , do oversampling with the SMOTE and the the model my code worked perfectly before i try smote not i cant find any solution here is the code without smote And here’s the code after adding smote Note: I tired importing make pipeline from imlearn when i import
Tag: scikit-learn
How to create a for loop with checking appended models
I have a list of models that I iterate through in a for loop getting their performances. I’ve added catboost to my model list, but when I try to add it’s best estimator to a dictionary it gives me an error no other models give me (TypeError: unhashable type: ‘CatBoostRegressor’). Googling and I can’t see a clear way around this
Why isn’t this Linear Regression line a straight line?
I have points with x and y coordinates I want to fit a straight line to with Linear Regression but I get a jagged looking line. I am attemting to use LinearRegression from sklearn. To create the points run a for loop that randomly crates one hundred points into an array that is 100 x 2 in shape. I slice
Is RandomOverSampler Causing my Model to Overfit?
I am attempting to see how well I can classify books according to genre using TfidfVectorizer. I am using five moderately imbalanced genre labels, and I want to use multilabel classification to assign each document one or more genres. Initially my performance was middling, so I tried to fix this by re-balancing the classes with RandomOverSampler, and my cross validated
Fit/transform separate sklearn transformers to partitions of single column
Use case: I have time series data for multiple assets (eg. AAPL, MSFT) and multiple features (eg. MACD, Volatility etc). I am building a ML model to make classification predictions on a subset of this data. Problem: For each asset & feature – I want to fit and apply a transformation. For example: for volatility, I want to fit a
Is there a way to use mutual information as part of a pipeline in scikit learn?
I’m creating a model with scikit-learn. The pipeline that seems to be working best is: mutual_info_classif with a threshold – i.e. only include fields whose mutual information score is above a given threshold. PCA LogisticRegression I’d like to do them all using sklearn’s pipeline object, but I’m not sure how to get the mutual info classification in. For the second
Constrained Multi-Linear Regression using Gekko
I have a multilinear regression problem where I have the prior information about the range of the output (dependent variable y) – The prediction must always lie in that range. I want to find the coefficients (upper and lower bound) of each feature (independent variables) in order to make the linear regression model restricted to the desired range of output.
Linear regression prediction based on group of data in test set
I have a simple dataset which looks like this: I created a simple LR model to train and predict the target variable “sales”. And I used MAE to evaluate the model My code works well, but what I want to do is to predict the sales in the X_test grouped by hour of the day. In the above dataset example,
Extracting feature names from sklearn column transformer
I’m using sklearn.pipeline to transform my features and fit a model, so my general flow looks like this: column transformer –> general pipeline –> model. I would like to be able to extract feature names from the column transformer (since the following step, general pipeline applies the same transformation to all columns, e.g. nan_to_zero) and use them for model explainability
Eli5.Sklearn PermutationImportance() — TypeError: check_cv() takes from 0 to 2 positional arguments but 3 were given
I am running permutation importance from eli5.sklearn. I keep getting this error : I am unsure how to go about this as I am only passing 2 arguments into perm.fit() Any advice would be appreciated. Thank You link to error message image Answer This is a known error, fixed in the master branch of the spinoff repo, but not yet