Skip to content
Advertisement

How to groupby 2 columns but order descending by count()

i have a dataframe and want to group 2 columns, which is working fine.

df.groupby(["Sektor, CustomerID"]).count().head(10)
            _Order_ID_  Order_timezone  Order_weight                                
AE  1298772       1         1                1  
    1298788       1         1                1  
    1298840       2         2                2  
    1298912       1         1                1

AT  1038570       1         1                1  
    1040424       1         1                1  
    1040425       3         3                3  
    1040426       2         2                2  
    1040427       1         1                1  
    1040428       1         1                1  
    1040429       2         2                2  
            

Now the grouped dataframe is sorted by the CustomerID values. But i want to sort it by the count(). So that i have the Sektor then the CustomerIDs but the CustomerIds that occure the most should be at the top. So descending.

Expected Output should be:

            _Order_ID_  Order_timezone  Order_weight                                
AE  1298840       2         2                2
    1298772       1         1                1  
    1298788       1         1                1      
    1298912       1         1                1

AT  1040425       3         3                3
    1040426       2         2                2
    1040429       2         2                2
    1038570       1         1                1  
    1040424       1         1                1          
    1040427       1         1                1  
    1040428       1         1                1      
            

How do i do that?

Advertisement

Answer

Use:

df1 = df.groupby(["Sektor", "CustomerID"]).count()

If need 10 rows in ouput:

df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).head(10)
print (df1)
                   _Order_ID_  Order_timezone  Order_weight
Sektor CustomerID                                          
AE     1298840              2               2             2
       1298772              1               1             1
       1298788              1               1             1
       1298912              1               1             1
AT     1040425              3               3             3
       1040426              2               2             2
       1040429              2               2             2
       1038570              1               1             1
       1040424              1               1             1
       1040427              1               1             1

If need 10 rows (if exist) per groups by Sektor:

df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).groupby('Sektor').head(10)
print (df1)
                   _Order_ID_  Order_timezone  Order_weight
Sektor CustomerID                                          
AE     1298840              2               2             2
       1298772              1               1             1
       1298788              1               1             1
       1298912              1               1             1
AT     1040425              3               3             3
       1040426              2               2             2
       1040429              2               2             2
       1038570              1               1             1
       1040424              1               1             1
       1040427              1               1             1
       1040428              1               1             1
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement