i am trying to create a pipeline that first impute missing data , do oversampling with the SMOTE and the the model
my code worked perfectly before i try smote not i cant find any solution
here is the code without smote
scoring = ['balanced_accuracy', 'f1_macro'] imputer = SimpleImputer(strategy='most_frequent') pipeline = Pipeline(steps=[('i', imputer),('m', model)]) # define model evaluation cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_validate(pipeline, X, y, scoring=scoring, cv=cv, n_jobs=-1)
And here’s the code after adding smote Note: I tired importing make pipeline from imlearn
imputer = SimpleImputer(strategy='most_frequent') pipeline = Pipeline(steps=[('i', imputer),('over', SMOTE()),('m', model)]) # define model evaluation cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) # evaluate model scores = cross_validate(pipeline, X, y, scoring=scoring, cv=cv, n_jobs=-1)
when i import Pipeline From SKLearn i got this error
All intermediate steps should be transformers and implement fit and transform or be the string ‘passthrough’ ‘SMOTE()’ (type <class ‘imblearn.over_sampling._smote.base.SMOTE’>) doesn’t
when i tried importing makepipeline from imlearn i get this error
Last step of Pipeline should implement fit or be the string ‘passthrough’. ‘[(‘i’, SimpleImputer(strategy=’most_frequent’)), (‘over’, SMOTE()), (‘m’, RandomForestClassifier())]’ (type <class ‘list’>) doesn’t
Advertisement
Answer
Use the imblearn pipline:
from imblearn.pipeline import Pipeline pipeline = Pipeline([('i', imputer),('over', SMOTE()),('m', model)])