Skip to content
Advertisement

Tag: countvectorizer

Neither stemmer nor lemmatizer seem to work very well, what should I do?

I am new to text analysis and am trying to create a bag of words model(using sklearn’s CountVectorizer method). I have a data frame with a column of text with words like ‘acid’, ‘acidic’, ‘acidity’, ‘wood’, ‘woodsy’, ‘woody’. I think that ‘acid’ and ‘wood’ should be the only words included in the final output, however neither stemming nor lemmatizing seems

TypeError during resampling

I am trying to apply resampling for my dataset which has unbalanced classes. What I have done is the following: Unfortunately, I am having some problems at this step: X = pd.concat([X_train, y_train], axis=1), i.e. You can think of Text column as I hope you can help me to handle with it. Answer You have to convert X_train to a

Advertisement