I would like to replace the following syntax with a cleaner, chained syntax – perhaps using .pipe
(similar to dplyr
library in R):
Sample dataset:
JavaScript
x
5
1
dt = pd.DataFrame({
2
"PACK_STR": ['112', '112', '112', '134', '145', '134'],
3
"FLAG_MODE_SUM_PCK": [1, 0, 1, 1, 1, 0]
4
})
5
Code to replace by piping:
JavaScript
1
3
1
packs_mode_sum_pck = dt.groupby(['PACK_STR']).FLAG_MODE_SUM_PCK.sum().reset_index().rename(columns={'FLAG_MODE_SUM_PCK':'OCCUR_MODE_SUM_PCK'})
2
dt = dt.merge(packs_mode_sum_pck, how="left", on='PACK_STR')
3
Expected output:
JavaScript
1
8
1
PACK_STR FLAG_MODE_SUM_PCK OCCUR_MODE_SUM_PCK
2
0 112 1 2
3
1 112 0 2
4
2 112 1 2
5
3 134 1 1
6
4 145 1 1
7
5 134 0 1
8
Advertisement
Answer
Here are two elegant ways to get your output:
JavaScript
1
5
1
def ssum(grp):
2
grp['OCCUR_MODE_SUM_PCK'] = grp['FLAG_MODE_SUM_PCK'].sum()
3
return grp
4
dt.groupby('PACK_STR').apply(ssum)
5
or
JavaScript
1
3
1
dt['OCCUR_MODE_SUM_PCK'] = dt.groupby('PACK_STR')['FLAG_MODE_SUM_PCK'].transform(sum)
2
dt
3