Skip to content
Advertisement

How do I save the data that has been randomly undersampled?

I am trying to balance a data frame by using random undersampling of the majority class. It has been successful, however, I also want to save the data that has been removed from the data frame (undersampled) to a new data frame. How do I accomplish this?

This is the code that I am using to undersample the data frame

from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(sampling_strategy=1)
X_res, y_res = rus.fit_resample(X, y)

df1 = pd.concat([X_res, y_res], axis=1)

Advertisement

Answer

RandomUnderSampler has an attribute sample_indices_, indicating the indices of the retained subsample. So this should do:

dropped_ids = [i for i in range(X.shape[0]) if i not in rus.sample_indices_]
X.iloc[dropped_ids]  # for dataframes
X[dropped_ids, :]  # for numpy arrays
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement