Skip to content

Tag: imblearn

Is RandomOverSampler Causing my Model to Overfit?

I am attempting to see how well I can classify books according to genre using TfidfVectorizer. I am using five moderately imbalanced genre labels, and I want to use multilabel classification to assign each document one or more genres. Initially my performance was middling, so I tried to fix this by re-balancing the classes with RandomOverSampler, and my cross validated

Imbalanced-Learn’s FunctionSampler throws ValueError

I want to use the class FunctionSampler from imblearn to create my own custom class for resampling my dataset. I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. Both come from a pd.DataFrame. I know that I have to reshape the feature array first since it is one-dimensional.

pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV

I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using GridSearchCV(). I am trying to create a pipeline to do these in order but failed so far. It throws invalid parameters. Answer The parameters
