Given the following dataframe,
is it possible to calculate the sum of col2
and the sum of col2 + col3
,
in a single aggregating function?
JavaScript
x
3
1
import pandas as pd
2
df = pd.DataFrame({'col1': ['a', 'a', 'b', 'b'], 'col2': [1, 2, 3, 4], 'col3': [10, 20, 30, 40]})
3
. | col1 | col2 | col3 |
---|---|---|---|
0 | a | 1 | 10 |
1 | a | 2 | 20 |
2 | b | 3 | 30 |
3 | b | 4 | 40 |
In R’s dplyr I would do it with a single line of summarize
,
and I was wondering what might be the equivalent in pandas:
JavaScript
1
2
1
df %>% group_by(col1) %>% summarize(col2_sum = sum(col2), col23_sum = sum(col2 + col3))
2
Desired result:
. | col1 | col2_sum | col23_sum |
---|---|---|---|
0 | a | 3 | 33 |
1 | b | 7 | 77 |
Advertisement
Answer
Let us try assign
the new column first
JavaScript
1
2
1
out = df.assign(col23 = df.col2+df.col3).groupby('col1',as_index=False).sum()
2
Out[81]:
JavaScript
1
4
1
col1 col2 col3 col23
2
0 a 3 30 33
3
1 b 7 70 77
4
From my understanding the apply
is more like the summarize
in R
JavaScript
1
9
1
out = df.groupby('col1').
2
apply(lambda x : pd.Series({'col2_sum':x['col2'].sum(),
3
'col23_sum':(x['col2'] + x['col3']).sum()})).
4
reset_index()
5
Out[83]:
6
col1 col2_sum col23_sum
7
0 a 3 33
8
1 b 7 77
9