How can I group by two columns interchangeably?
For example, if I have this table
and I want to get
However, I get this instead when I use
JavaScript
x
2
1
df.insert(2, 'Count', df.groupby(['Name1','Name2'])['Name1'].transform('size'))
2
The entries (rows) that have the same names but exchanged are considered to be new entries, but i want to treat them the same way, can you please tell me a way to do this?
Advertisement
Answer
Example with shorter DataFrame:
JavaScript
1
8
1
df = pd.DataFrame({'name1': ['Alex', 'Alex', 'Sarah', 'Martin'], 'name2': ['Martin', 'Martin', 'Alex', 'Alex']})
2
3
df['tmp'] = df.apply(frozenset, axis=1)
4
df['count'] = df.groupby('tmp')['name1'].transform('size')
5
df = df.set_index('tmp')
6
df = df[~df.index.duplicated()].reset_index(drop=True)
7
print(df)
8
Prints:
JavaScript
1
4
1
name1 name2 count
2
0 Alex Martin 3
3
1 Sarah Alex 1
4