Skip to content
Advertisement

Group by and find top n value_counts pandas

I have a dataframe of taxi data with two columns that looks like this:

Neighborhood    Borough        Time
Midtown         Manhattan      X
Melrose         Bronx          Y
Grant City      Staten Island  Z
Midtown         Manhattan      A
Lincoln Square  Manhattan      B

Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this:

df['Neighborhood'].groupby(df['Borough']).value_counts()

Which gives me something like this:

borough                          
Bronx          High  Bridge          3424
               Mott Haven            2515
               Concourse Village     1443
               Port Morris           1153
               Melrose                492
               North Riverdale        463
               Eastchester            434
               Concourse              395
               Fordham                252
               Wakefield              214
               Kingsbridge            212
               Mount Hope             200
               Parkchester            191
......

Staten Island  Castleton Corners        4
               Dongan Hills             4
               Eltingville              4
               Graniteville             4
               Great Kills              4
               Castleton                3
               Woodrow                  1

How do I filter it so that I get only the top 5 from each? I know there are a few questions with a similar title but they weren’t helpful to my case.

Advertisement

Answer

I think you can use nlargest – you can change 1 to 5:

s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough                      
Bronx          Melrose            7
Manhattan      Midtown           12
               Lincoln Square     2
Staten Island  Grant City        11
dtype: int64

print s.groupby(level=[0,1]).nlargest(1)
Bronx          Bronx          Melrose        7
Manhattan      Manhattan      Midtown       12
Staten Island  Staten Island  Grant City    11
dtype: int64

additional columns were getting created, specified level info

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement