Hi I have a data frame which looks like this
col1 col2 0 A 1 1 B 2 2 C 3 3 A 4 4 C 5 5 A 6
I would like to groupby and sum for non repeating values in col1 for e.g.
A,B,C => 6 A,C => 9 A => 6
Is there any way I can do this via pandas functions?
Advertisement
Answer
IIUC, you could create groups using groupby
+ cumcount
(where the nth occurrences of each col1
value will be grouped the same); then groupby the groups and join
“col1″s and sum
“col2″s:
out = df.groupby(df.groupby('col1').cumcount()).agg({'col1':','.join, 'col2':'sum'})
Output:
col1 col2 0 A,B,C 6 1 A,C 9 2 A 6