Pandas Dataframe: Retrieve the Maximum Value in a Pandas Dataframe using .groupby and .idxmax()

I have a Pandas Dataframe that contains a series of Airbnb Prices grouped by neighbourhood group neighbourhood and room_type. My objective is to return the Maximum Average Price for each room_type per Neighbourhood and return only this.

My approach to this was to use .groupby and .idxmax() to get the maximum values w.r.t to the Index, and then iterate through this accordingly and append the index positions to a list, which is used for filtering purposes. (This link was helpful: Python Pandas Dataframe select row by max value in group)

df = airbnb.loc[airbnb['neighbourhood'].isin(list(combined['neighbourhood']))].groupby(['neighbourhood','room_type']).mean().sort_values(by=['Revenues'],ascending=False)['Revenues'].reset_index().sort_values(by=['neighbourhood','room_type']).reset_index()
tmp_list = []
test = pd.DataFrame(df.groupby(['neighbourhood','room_type'])['Revenues'].idxmax(axis=0)).reset_index()
#Re-Assign Revenues back to the Dataframe for Reference
test['Actual_Revenues'] = df['Revenues']
#The Maximum Revenue Value for each Sub Grouping is Returned by Index
for i in neighbourhood_list:
    print(test[test['neighbourhood']==i])
    max_index_list = test[test['neighbourhood']==i].sort_values(by='Actual_Revenues', ascending=False).head(1)['Revenues']
    print(max_index_list)
    tmp_list.append(list(max_index_list))
    
tmp_list = list(np.concatenate(tmp_list).flat)
df[df.index.isin(tmp_list)]

Now this works smoothly, but I don’t believe this is very performant (esp. with my loop).

How might I seek to optimise my code further to reduce any redundancies and make my code more Pythonic?

Our expected output shows the below, with only the maximum room type and pricing shown per Neighbourhood. (i.e. One unique entry per Neighbourhood).

Thank you very much.

Here is a reproducible dataset you can use to test the code: https://docs.google.com/spreadsheets/d/1x-ktbfJouPzI0hokKw0fSNXAwmD8Q8Yd/edit?usp=sharing&ouid=101578147330059746959&rtpof=true&sd=true

Here is a picture of my output (using the complete dataset as opposed to the partial shown – this is just to show the concept of the maximum price shown per Neighbourhood. If a neighbourhood has multiple room types (which they will), it will return the maximum price and the associated room type.)

https://ibb.co/HnBysVc

Answer

Approach using Series.nlargest():

# Get mean price per room_type per neighbourhood
means = df.groupby(['neighbourhood', 'room_type'])['price'].mean()

# Get the maximum mean price per room_type per neighbourhood
max_means = (means.groupby(level=0, group_keys=False)
                  .nlargest(1)
                  .reset_index())

First five rows of max_means:

neighbourhood	room_type	price
Bedford-Stuyvesant	Entire home/apt	120
Chelsea	Private room	140
Chinatown	Entire home/apt	150
Clinton Hill	Entire home/apt	89
Crown Heights	Entire home/apt	99

Advertisement

Answer