Skip to content
Advertisement

Pandas Dataframe: Retrieve the Maximum Value in a Pandas Dataframe using .groupby and .idxmax()

I have a Pandas Dataframe that contains a series of Airbnb Prices grouped by neighbourhood group neighbourhood and room_type. My objective is to return the Maximum Average Price for each room_type per Neighbourhood and return only this.

My approach to this was to use .groupby and .idxmax() to get the maximum values w.r.t to the Index, and then iterate through this accordingly and append the index positions to a list, which is used for filtering purposes. (This link was helpful: Python Pandas Dataframe select row by max value in group)

df = airbnb.loc[airbnb['neighbourhood'].isin(list(combined['neighbourhood']))].groupby(['neighbourhood','room_type']).mean().sort_values(by=['Revenues'],ascending=False)['Revenues'].reset_index().sort_values(by=['neighbourhood','room_type']).reset_index()
tmp_list = []
test = pd.DataFrame(df.groupby(['neighbourhood','room_type'])['Revenues'].idxmax(axis=0)).reset_index()
#Re-Assign Revenues back to the Dataframe for Reference
test['Actual_Revenues'] = df['Revenues']
#The Maximum Revenue Value for each Sub Grouping is Returned by Index
for i in neighbourhood_list:
    print(test[test['neighbourhood']==i])
    max_index_list = test[test['neighbourhood']==i].sort_values(by='Actual_Revenues', ascending=False).head(1)['Revenues']
    print(max_index_list)
    tmp_list.append(list(max_index_list))
    
tmp_list = list(np.concatenate(tmp_list).flat)
df[df.index.isin(tmp_list)]

Now this works smoothly, but I don’t believe this is very performant (esp. with my loop).

How might I seek to optimise my code further to reduce any redundancies and make my code more Pythonic?

Our expected output shows the below, with only the maximum room type and pricing shown per Neighbourhood. (i.e. One unique entry per Neighbourhood).

Thank you very much.

Here is a reproducible dataset you can use to test the code: https://docs.google.com/spreadsheets/d/1x-ktbfJouPzI0hokKw0fSNXAwmD8Q8Yd/edit?usp=sharing&ouid=101578147330059746959&rtpof=true&sd=true

Here is a picture of my output (using the complete dataset as opposed to the partial shown – this is just to show the concept of the maximum price shown per Neighbourhood. If a neighbourhood has multiple room types (which they will), it will return the maximum price and the associated room type.)

https://ibb.co/HnBysVc

Advertisement

Answer

Approach using Series.nlargest():

# Get mean price per room_type per neighbourhood
means = df.groupby(['neighbourhood', 'room_type'])['price'].mean()

# Get the maximum mean price per room_type per neighbourhood
max_means = (means.groupby(level=0, group_keys=False)
                  .nlargest(1)
                  .reset_index())

First five rows of max_means:

neighbourhood room_type price
Bedford-Stuyvesant Entire home/apt 120
Chelsea Private room 140
Chinatown Entire home/apt 150
Clinton Hill Entire home/apt 89
Crown Heights Entire home/apt 99
Advertisement