I have a sample dataframe that looks like below. I’d like to eventually group row 1 and row 3 together, since they contain identical items in different columns.
x y count a,b b,a 5 a,c c,a 2 b,a a,b 1
I’ve spent a lot of time trying to solve this, but have not encountered a good solution yet. What steps should I take to reach the below final dataframe?
x y count a,b b,a 5+1 a,c c,a 2
Advertisement
Answer
You can try:
df.groupby((df.x + df.y).str.replace(',', '').apply(lambda x: ''.join(sorted(x))) ).agg({'x': 'first', 'y': 'first', 'count': sum}).reset_index(drop=True)
OUTPUT:
x y count 0 a,b b,a 6 1 a,c c,a 2