Among the columns of my DataFrame I have ClientID CartID FoodID Quantity, I would like to find what is the food that the client has bought the most.
I tried this:
df.groupby(["ClientID", "FoodID"])['Quantity'].sum().max(level=['ClientID', 'FoodID']).reset_index()
But got a completely wrong output:
Client FoodID Quantity 1 10 16008 24 2 10 17021 36 3 10 20665 6 4 10 20719 40 ... ... ... ... 267610 200 84920 4 267611 200 85039 96 267612 200 85039 120 267613 200 85040 48 267614 200 85173 48
EDIT: I also tried
mbi.groupby("ClientID")["Quantity"].max()
but this results in a pair (ClientID, Quantity of the most bought food), I need (Client, FoodID)
Advertisement
Answer
df.groupby(["ClientID", "FoodID"])['Quantity'].sum().reset_index().sort_values( ["ClientID", 'Quantity'], ascending=False).drop_duplicates( ["ClientID"]).sort_values('ClientID')
First get a df with contains the total Quantity
for each ClientID
, FoodID
combination. Then sort the df on ClientID
, Quantity
so that highest Quantity
per client appears on the top and finally drop the duplicates per client which will drop all the clients records except the top which happens to be max quantity.
Test case:
np.random.seed(0) df = pd.DataFrame({ 'ClientID' : np.random.randint(1,10, 1000), 'FoodID' : np.random.randint(1,10, 1000), 'Quantity' : np.random.randint(1,10, 1000), }) df.groupby(["ClientID", "FoodID"])['Quantity'].sum().reset_index().sort_values( ["ClientID", 'Quantity'], ascending=False).drop_duplicates( ["ClientID"]).sort_values('ClientID')
Output:
ClientID FoodID Quantity 3 1 4 97 16 2 8 82 26 3 9 100 35 4 9 98 44 5 9 85 47 6 3 107 54 7 1 94 69 8 7 107 73 9 2 109