Skip to content
Advertisement

Tag: scikit-learn

Is RandomOverSampler Causing my Model to Overfit?

I am attempting to see how well I can classify books according to genre using TfidfVectorizer. I am using five moderately imbalanced genre labels, and I want to use multilabel classification to assign each document one or more genres. Initially my performance was middling, so I tried to fix this by re-balancing the classes with RandomOverSampler, and my cross validated

Extracting feature names from sklearn column transformer

I’m using sklearn.pipeline to transform my features and fit a model, so my general flow looks like this: column transformer –> general pipeline –> model. I would like to be able to extract feature names from the column transformer (since the following step, general pipeline applies the same transformation to all columns, e.g. nan_to_zero) and use them for model explainability

Advertisement