Skip to content
Advertisement

Pandas – partition a dataframe into two groups with an approximate mean value

I want to split all rows into two groups that have similar means.

I have a dataframe of about 50 rows but this could go into several thousands with a column of interest called ‘value’.

JavaScript

So far I tried using cumulative sum for which total column was created then I essentially made the split based on where the mid-point of the total column is. Based on this solution.

JavaScript

If I try to group them and take the average for each group then the difference is quite significant

JavaScript

Is there a way I could achieve this partition based on means instead of sums? I was thinking about using expanding means from pandas but couldn’t find a proper way to do it.

Advertisement

Answer

I am not sure I understand what you are trying to do, but possibly you want to groupy by quantiles of a column. If so:

JavaScript

which will have bucket=0 for the half of rows with the lesser value values. And 1 for the rest. By tweakign the q parameter you can have as many groups as you want (as long as <= number of rows).

Edit: New attemp, now that I think I understand better your aim:

JavaScript
Advertisement