I am attempting to see how well I can classify books according to genre using TfidfVectorizer. I am using five moderately imbalanced genre labels, and I want to use multilabel classification to assign each document one or more genres. Initially my performance was middling, so I tried to fix this by re-balanci…
Tag: imblearn
Imbalanced-Learn’s FunctionSampler throws ValueError
I want to use the class FunctionSampler from imblearn to create my own custom class for resampling my dataset. I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. Both come from a pd.DataFrame. I know that I have to reshape the f…
pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV
I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using GridSearchCV(). I am trying to create a pipeline to do these…