Skip to content

Can pandas perform an aggregating operation involving two columns?

Given the following dataframe,
is it possible to calculate the sum of col2 and the sum of col2 + col3,
in a single aggregating function?

import pandas as pd
df = pd.DataFrame({'col1': ['a', 'a', 'b', 'b'], 'col2': [1, 2, 3, 4], 'col3': [10, 20, 30, 40]})
. col1 col2 col3
0 a 1 10
1 a 2 20
2 b 3 30
3 b 4 40

In R’s dplyr I would do it with a single line of summarize,
and I was wondering what might be the equivalent in pandas:

df %>% group_by(col1) %>% summarize(col2_sum = sum(col2), col23_sum = sum(col2 + col3))

Desired result:

. col1 col2_sum col23_sum
0 a 3 33
1 b 7 77

Advertisement

Answer

Let us try assign the new column first

out = df.assign(col23 = df.col2+df.col3).groupby('col1',as_index=False).sum()

Out[81]:

  col1  col2  col3  col23
0    a     3    30     33
1    b     7    70     77

From my understanding the apply is more like the summarize in R

out = df.groupby('col1').
           apply(lambda x : pd.Series({'col2_sum':x['col2'].sum(),
                                       'col23_sum':(x['col2'] + x['col3']).sum()})).
               reset_index()
Out[83]: 
  col1  col2_sum  col23_sum
0    a         3         33
1    b         7         77
User contributions licensed under: CC BY-SA
3 People found this is helpful