Skip to content
Advertisement

Pandas oversampling ragged sequential data

Trying to use pandas to oversample my ragged data (data with different lengths).

Given the following data samples:

JavaScript

Data (groups are separated with --- for convince):

JavaScript

Targets:

JavaScript

I would like to balance the minority class. In the sample above, target 1 is the minority class with 2 samples, for ids 1 & 3.

I’m looking for a way to oversample the data so the results would be:

JavaScript

And the targets would be balanced:

JavaScript

With exactly 4 positive and 4 negative samples.

Advertisement

Answer

You can use:

JavaScript

JavaScript

Solution above working only for non balanced data, if possible sometimes balanced:

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement