Skip to content
Advertisement

How can I group by two columns interchangeably?

How can I group by two columns interchangeably?

For example, if I have this table

image with two columns of variables

and I want to get

image with 3 columns of desired variables

However, I get this instead when I use

df.insert(2, 'Count', df.groupby(['Name1','Name2'])['Name1'].transform('size'))

The resulting table

The entries (rows) that have the same names but exchanged are considered to be new entries, but i want to treat them the same way, can you please tell me a way to do this?

Advertisement

Answer

Example with shorter DataFrame:

df = pd.DataFrame({'name1': ['Alex', 'Alex', 'Sarah', 'Martin'], 'name2': ['Martin', 'Martin', 'Alex', 'Alex']})

df['tmp'] = df.apply(frozenset, axis=1)
df['count'] = df.groupby('tmp')['name1'].transform('size')
df = df.set_index('tmp')
df = df[~df.index.duplicated()].reset_index(drop=True)
print(df)

Prints:

   name1   name2  count
0   Alex  Martin      3
1  Sarah    Alex      1
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement