Skip to content
Advertisement

How to find the most frequent appearence in one column for different values in a different column of a grouped dataframe?

The question is not so clear I guess, so here is an example: given a dataframe:

company_name company_size company_acitivity
7 eleven 5 restaurant
7 eleven 5 supermarket
7 eleven 10 supermarket
goldman sachs 100 bank
goldman sachs 200 restaurant
goldman sachs 200 bank

I want to group the dataframe by company name and then replace the values in the organization_size and organization_acitivity columns with the values that have the highest occurrence for the respective company and column.

So in the end the dataframe should look like this:

company_name company_size company_acitivity
7 eleven 5 supermarket
goldman sachs 200 bank

I tried this:

df.groupby("organization_name",group_keys=True)["organization_activity"].apply(lambda x: x.mode())

But it only gives me

“AttributeError: ‘SeriesGroupBy’ object has no attribute ‘mode'”.

Does someone have an idea for an easier way to do this?

Advertisement

Answer

You don’t want to select a column after the groupby, since you want to apply that to all the available columns.

Try this:

df.groupby('company_name').apply(lambda x: x.mode()).reset_index(drop=True)

Output:

    company_name  company_size company_acitivity
0       7 eleven             5       supermarket
1  goldman sachs           200              bank
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement