Pandas – partition a dataframe into two groups with an approximate mean value

Question

I want to split all rows into two groups that have similar means. I have a dataframe of about 50 rows but this could go into several thousands with a column of interest called 'value'. So far I tried using cumulative sum for which total column was created then I essentially made the split based on where the mid-point of

Accepted Answer

I am not sure I understand what you are trying to do, but possibly you want to groupy by quantiles of a column. If so:test['bucket'] = pd.qcut(test['value'], q=2, labels=False)which will have bucket=0 for the half of rows with the lesser value values. And 1 for the rest. By tweakign the q parameter you can have as many groups as you want (as long as <= number of rows).Edit:New attemp, now that I think I understand better your aim:df = pd.DataFrame( {'value':pd.np.arange(100)})df['group'] = df['value'].argsort().mod(2)df.groupby('group')['value'].mean()# group# 0    49# 1    50# Name: value, dtype: int64​df['group'] = df['value'].argsort().mod(3)df.groupby('group')['value'].mean()#group# 0    49.5# 1    49.0# 2    50.0# Name: value, dtype: float64

Advertisement

Answer