How can I group by two columns interchangeably?
For example, if I have this table
and I want to get
However, I get this instead when I use
df.insert(2, 'Count', df.groupby(['Name1','Name2'])['Name1'].transform('size'))
The entries (rows) that have the same names but exchanged are considered to be new entries, but i want to treat them the same way, can you please tell me a way to do this?
Advertisement
Answer
Example with shorter DataFrame:
df = pd.DataFrame({'name1': ['Alex', 'Alex', 'Sarah', 'Martin'], 'name2': ['Martin', 'Martin', 'Alex', 'Alex']}) df['tmp'] = df.apply(frozenset, axis=1) df['count'] = df.groupby('tmp')['name1'].transform('size') df = df.set_index('tmp') df = df[~df.index.duplicated()].reset_index(drop=True) print(df)
Prints:
name1 name2 count 0 Alex Martin 3 1 Sarah Alex 1