Skip to content
Advertisement

How to properly use Smote in Classification models

I am using smote to balanced the output (y) only for Model train but want to test the model with original data as it makes logic how we can test the model with smote created outputs. Please ask anything for clarification if I didn’t explained it well. It’s my starting on Stack overflow.

JavaScript

Here i applied the Random Forest Classifier on my data

JavaScript

If i applied this but X also contains the data which we used for train. how we can remove the data which we already used for training the data.

JavaScript

Advertisement

Answer

I used SMOTE in the past, it is suboptimal. Lately, researchers have proven some flaws in the generated distribution of Synthetic Minority Oversample Technique (SMOTE). I know sometimes we don’t have a choice regarding the unbalanced classes, but you can use sklearn.ensemble.RandomForestClassifier, where you can define a proper class_weight to handle the unbalanced class problem.

Check scikit-learn documentation:

Scikit-documentation

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement