I have a dataframe of taxi data with two columns that looks like this:
Neighborhood Borough Time Midtown Manhattan X Melrose Bronx Y Grant City Staten Island Z Midtown Manhattan A Lincoln Square Manhattan B
Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:
df['Neighborhood'].groupby(df['Borough']).value_counts()
Which gives me something like this:
borough Bronx High Bridge 3424 Mott Haven 2515 Concourse Village 1443 Port Morris 1153 Melrose 492 North Riverdale 463 Eastchester 434 Concourse 395 Fordham 252 Wakefield 214 Kingsbridge 212 Mount Hope 200 Parkchester 191 ...... Staten Island Castleton Corners 4 Dongan Hills 4 Eltingville 4 Graniteville 4 Great Kills 4 Castleton 3 Woodrow 1
How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren’t helpful to my case.
Advertisement
Answer
I think you can use nlargest
– you can change 1
to 5
:
s = df['Neighborhood'].groupby(df['Borough']).value_counts() print s Borough Bronx Melrose 7 Manhattan Midtown 12 Lincoln Square 2 Staten Island Grant City 11 dtype: int64 print s.groupby(level=[0,1]).nlargest(1) Bronx Bronx Melrose 7 Manhattan Manhattan Midtown 12 Staten Island Staten Island Grant City 11 dtype: int64
additional columns were getting created, specified level info