i have a dataframe and want to group 2 columns, which is working fine.
df.groupby(["Sektor, CustomerID"]).count().head(10)
_Order_ID_ Order_timezone Order_weight AE 1298772 1 1 1 1298788 1 1 1 1298840 2 2 2 1298912 1 1 1 AT 1038570 1 1 1 1040424 1 1 1 1040425 3 3 3 1040426 2 2 2 1040427 1 1 1 1040428 1 1 1 1040429 2 2 2
Now the grouped dataframe is sorted by the CustomerID values. But i want to sort it by the count(). So that i have the Sektor then the CustomerIDs but the CustomerIds that occure the most should be at the top. So descending.
Expected Output should be:
_Order_ID_ Order_timezone Order_weight AE 1298840 2 2 2 1298772 1 1 1 1298788 1 1 1 1298912 1 1 1 AT 1040425 3 3 3 1040426 2 2 2 1040429 2 2 2 1038570 1 1 1 1040424 1 1 1 1040427 1 1 1 1040428 1 1 1
How do i do that?
Advertisement
Answer
Use:
df1 = df.groupby(["Sektor", "CustomerID"]).count()
If need 10 rows in ouput:
df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).head(10) print (df1) _Order_ID_ Order_timezone Order_weight Sektor CustomerID AE 1298840 2 2 2 1298772 1 1 1 1298788 1 1 1 1298912 1 1 1 AT 1040425 3 3 3 1040426 2 2 2 1040429 2 2 2 1038570 1 1 1 1040424 1 1 1 1040427 1 1 1
If need 10 rows (if exist) per groups by Sektor
:
df1 = df1.sort_values(['Sektor','_Order_ID_'], ascending=[True, False]).groupby('Sektor').head(10) print (df1) _Order_ID_ Order_timezone Order_weight Sektor CustomerID AE 1298840 2 2 2 1298772 1 1 1 1298788 1 1 1 1298912 1 1 1 AT 1040425 3 3 3 1040426 2 2 2 1040429 2 2 2 1038570 1 1 1 1040424 1 1 1 1040427 1 1 1 1040428 1 1 1