Skip to content
Advertisement

Pandas rolling up column values based upon max value in column when aggregating

I have a problem regarding my pandas data frame which contains row level for users, such as which group they belong to, their country, type, and total number of impressions from that user. An example slice of my df:

JavaScript

As you can see, the info is 1 row per user, where users in a group can belong to different country, type and different no of total impressions.

What I would like to do is roll this data up to the userGroup level, getting rid of the userID, keeping the userCountry and userType of the user with the highest number of totalImpressions , and summing up the totalImpression for all users in that group. This should result in a data frame like:

JavaScript

As you can see groupCountry and groupType are coming from the user within that group with the highest totalImpression rather than the first row value in a group.

Is this something possible in pandas, I know I could aggregate using pd.groupby, but from here I am not sure how to select the country/type of the the top user by totalImpressions. Any help would be greatly appreciated!

To generate my example data :

JavaScript

Advertisement

Answer

If you sort ascending your dataframe by the totalImpressions column, you just have to keep the last row for each group and sum impressions.

Use groupby.agg:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement