I have a sample dataframe that looks like below. I’d like to eventually group row 1 and row 3 together, since they contain identical items in different columns.
JavaScript
x
5
1
x y count
2
a,b b,a 5
3
a,c c,a 2
4
b,a a,b 1
5
I’ve spent a lot of time trying to solve this, but have not encountered a good solution yet. What steps should I take to reach the below final dataframe?
JavaScript
1
4
1
x y count
2
a,b b,a 5+1
3
a,c c,a 2
4
Advertisement
Answer
You can try:
JavaScript
1
3
1
df.groupby((df.x + df.y).str.replace(',', '').apply(lambda x: ''.join(sorted(x)))
2
).agg({'x': 'first', 'y': 'first', 'count': sum}).reset_index(drop=True)
3
OUTPUT:
JavaScript
1
4
1
x y count
2
0 a,b b,a 6
3
1 a,c c,a 2
4