Pandas find most bought item given ClientID ItemID ItemQuantity

Tags: ,



Among the columns of my DataFrame I have ClientID CartID FoodID Quantity, I would like to find what is the food that the client has bought the most.

I tried this:

df.groupby(["ClientID", "FoodID"])['Quantity'].sum().max(level=['ClientID', 'FoodID']).reset_index()

But got a completely wrong output:

    Client  FoodID  Quantity
1   10      16008   24
2   10      17021   36
3   10      20665   6
4   10      20719   40
... ... ... ...
267610  200 84920   4
267611  200 85039   96
267612  200 85039   120
267613  200 85040   48
267614  200 85173   48

EDIT: I also tried

mbi.groupby("ClientID")["Quantity"].max()

but this results in a pair (ClientID, Quantity of the most bought food), I need (Client, FoodID)

Answer

df.groupby(["ClientID", "FoodID"])['Quantity'].sum().reset_index().sort_values(
    ["ClientID", 'Quantity'], ascending=False).drop_duplicates(
    ["ClientID"]).sort_values('ClientID')

First get a df with contains the total Quantity for each ClientID, FoodID combination. Then sort the df on ClientID, Quantity so that highest Quantity per client appears on the top and finally drop the duplicates per client which will drop all the clients records except the top which happens to be max quantity.

Test case:

np.random.seed(0)

df = pd.DataFrame({
    'ClientID' : np.random.randint(1,10, 1000),
    'FoodID' : np.random.randint(1,10, 1000),
    'Quantity' : np.random.randint(1,10, 1000),
})

df.groupby(["ClientID", "FoodID"])['Quantity'].sum().reset_index().sort_values(
    ["ClientID", 'Quantity'], ascending=False).drop_duplicates(
    ["ClientID"]).sort_values('ClientID')

Output:

    ClientID    FoodID  Quantity
3   1           4       97
16  2           8       82
26  3           9       100
35  4           9       98
44  5           9       85
47  6           3       107
54  7           1       94
69  8           7       107
73  9           2       109


Source: stackoverflow