i have a dataframe and want to group 2 columns, which is working fine.
df.groupby(["Sektor, CustomerID"]).count().head(10)
_Order_ID_ Order_timezone Order_weight
AE 1298772 1 1 1
1298788 1 1 1
1298840 2 2 2
1298912 1 1 1
AT 1038570 1 1 1
1040424 1 1 1
1040425 3 3 3
1040426 2 2 2
1040427 1 1 1
1040428 1 1 1
1040429 2 2 2
Now the grouped dataframe is sorted by the CustomerID values. But i want to sort it by the count(). So that i have the Sektor then the CustomerIDs but the CustomerIds that occure the most should be at the top. So descending.
Expected Output should be:
_Order_ID_ Order_timezone Order_weight
AE 1298840 2 2 2
1298772 1 1 1
1298788 1 1 1
1298912 1 1 1
AT 1040425 3 3 3
1040426 2 2 2
1040429 2 2 2
1038570 1 1 1
1040424 1 1 1
1040427 1 1 1
1040428 1 1 1
How do i do that?
Advertisement
Answer
Use:
df1 = df.groupby(["Sektor", "CustomerID"]).count()
If need 10 rows in ouput:
df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).head(10)
print (df1)
_Order_ID_ Order_timezone Order_weight
Sektor CustomerID
AE 1298840 2 2 2
1298772 1 1 1
1298788 1 1 1
1298912 1 1 1
AT 1040425 3 3 3
1040426 2 2 2
1040429 2 2 2
1038570 1 1 1
1040424 1 1 1
1040427 1 1 1
If need 10 rows (if exist) per groups by Sektor:
df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).groupby('Sektor').head(10)
print (df1)
_Order_ID_ Order_timezone Order_weight
Sektor CustomerID
AE 1298840 2 2 2
1298772 1 1 1
1298788 1 1 1
1298912 1 1 1
AT 1040425 3 3 3
1040426 2 2 2
1040429 2 2 2
1038570 1 1 1
1040424 1 1 1
1040427 1 1 1
1040428 1 1 1