I am trying to balance a data frame by using random undersampling of the majority class. It has been successful, however, I also want to save the data that has been removed from the data frame (undersampled) to a new data frame. How do I accomplish this?
This is the code that I am using to undersample the data frame
from imblearn.under_sampling import RandomUnderSampler rus = RandomUnderSampler(sampling_strategy=1) X_res, y_res = rus.fit_resample(X, y) df1 = pd.concat([X_res, y_res], axis=1)
Advertisement
Answer
RandomUnderSampler
has an attribute sample_indices_
, indicating the indices of the retained subsample. So this should do:
dropped_ids = [i for i in range(X.shape[0]) if i not in rus.sample_indices_] X.iloc[dropped_ids] # for dataframes X[dropped_ids, :] # for numpy arrays