Skip to content
Advertisement

Python pandas group non repeating values

Hi I have a data frame which looks like this

         col1     col2
    0      A       1
    1      B       2
    2      C       3
    3      A       4
    4      C       5
    5      A       6

I would like to groupby and sum for non repeating values in col1 for e.g.

A,B,C => 6
A,C => 9
A => 6

Is there any way I can do this via pandas functions?

Advertisement

Answer

IIUC, you could create groups using groupby + cumcount (where the nth occurrences of each col1 value will be grouped the same); then groupby the groups and join “col1″s and sum “col2″s:

out = df.groupby(df.groupby('col1').cumcount()).agg({'col1':','.join, 'col2':'sum'})

Output:

    col1  col2
0  A,B,C     6
1    A,C     9
2      A     6
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement