How to do aggregation based on 3 binary columns and range column to calculate percentage participation in thet combination in Python Pandas?

Question

I have DataFrame in Python Pandas like below: ID U1 U2 U3 CP CH 111 1 1 0 10-20 1 222 1 0 1 10-20 1 333 0 1 0 20-30 0 444 0 1 1 40-50 0 555 1 0 0 10-20 0 And I need to create column with percent of '1' in column 'CH' per combination for:

Accepted Answer

You can use a melt and groupby.sum based approach:(df.drop(columns='ID')   .melt(['CP', 'CH'], var_name='idx')   # keep only CH where value is 1   .assign(CH=lambda d: d['CH'].mul(d['value']))   .groupby(['idx', 'CP'], as_index=False).sum()   .assign(CH_perc=lambda d: d.pop('CH').div(d.pop('value')).fillna(0)))output:  idx     CP   CH_perc0  U1  10-20  0.6666671  U1  20-30  0.0000002  U1  40-50  0.0000003  U2  10-20  1.0000004  U2  20-30  0.0000005  U2  40-50  0.0000006  U3  10-20  1.0000007  U3  20-30  0.0000008  U3  40-50  0.000000

ID	U1	U2	U3	CP	CH
111	1	1	0	10-20	1
222	1	0	1	10-20	1
333	0	1	0	20-30	0
444	0	1	1	40-50	0
555	1	0	0	10-20	0

Advertisement

Answer