I am trying to balance a data frame by using random undersampling of the majority class. It has been successful, however, I also want to save the data that has been removed from the data frame (undersampled) to a new data frame. How do I accomplish this?
This is the code that I am using to undersample the data frame
JavaScript
x
7
1
from imblearn.under_sampling import RandomUnderSampler
2
3
rus = RandomUnderSampler(sampling_strategy=1)
4
X_res, y_res = rus.fit_resample(X, y)
5
6
df1 = pd.concat([X_res, y_res], axis=1)
7
Advertisement
Answer
RandomUnderSampler
has an attribute sample_indices_
, indicating the indices of the retained subsample. So this should do:
JavaScript
1
4
1
dropped_ids = [i for i in range(X.shape[0]) if i not in rus.sample_indices_]
2
X.iloc[dropped_ids] # for dataframes
3
X[dropped_ids, :] # for numpy arrays
4