I have DataFrame in Python Pandas like below:
ID | U1 | U2 | U3 | CP | CH |
---|---|---|---|---|---|
111 | 1 | 1 | 0 | 10-20 | 1 |
222 | 1 | 0 | 1 | 10-20 | 1 |
333 | 0 | 1 | 0 | 20-30 | 0 |
444 | 0 | 1 | 1 | 40-50 | 0 |
555 | 1 | 0 | 0 | 10-20 | 0 |
And I need to create column with percent of ‘1’ in column ‘CH’ per combination for: U1/U2/U3 and CP, so as a result i need something like below:
| idx | idx | CH_perc | | ----|---------|-----------| |U1 |CP_10_20 | 0.66 | 3 IDs have '1' in U1 and CP = 10-20, and 2 of them have '1' in CH, so 2/3 = 0.66 |U1 |CP_20_30 | 0 | |U1 |CP_30_40 | 0 | |U1 |CP_40_50 | 0 | |U2 |CP_10_20 | 1.00 | 1 ID have '1' in U2 and CP = 10-20, and have '1' in CH, so 1/1/ = 1.00 |U2 |CP_20_30 | 0 | |U2 |CP_30_40 | 0 | |U2 |CP_40_50 | 0 | |U3 |CP_10_20 | 1.00 | 1 ID have '1' in U3 and CP = 10-20, and have '1' in CH, so 1/1/ = 1.00 |U3 |CP_20_30 | 0 | |U3 |CP_30_40 | 0 | |U3 |CP_40_50 | 0 |
How can I od that in Python Pandas ?
Advertisement
Answer
You can use a melt
and groupby.sum
based approach:
(df.drop(columns='ID') .melt(['CP', 'CH'], var_name='idx') # keep only CH where value is 1 .assign(CH=lambda d: d['CH'].mul(d['value'])) .groupby(['idx', 'CP'], as_index=False).sum() .assign(CH_perc=lambda d: d.pop('CH').div(d.pop('value')).fillna(0)) )
output:
idx CP CH_perc 0 U1 10-20 0.666667 1 U1 20-30 0.000000 2 U1 40-50 0.000000 3 U2 10-20 1.000000 4 U2 20-30 0.000000 5 U2 40-50 0.000000 6 U3 10-20 1.000000 7 U3 20-30 0.000000 8 U3 40-50 0.000000