Skip to content
Advertisement

Python: How to add groupby but not affect ngroup()?

per user I want an unique item order (as they click through them). If a item already has been seen, then don’t cumulative count, but place the already assigned value there. For example, c,d, g & b in the tables below. I used the function below, but its not getting the job done at the moment. If I add the ‘user_id’ to the grouper I mess up the ngroup(). Can anyone help me with this?

df['Order Number'] = df.groupby(pd.Grouper(key='Item',sort=False)).ngroup()+1

print(df)

Current Output:

  User_id  Item  Order Number
0     1        b            1
1     1        a            2
2     1        c            3
3     1        d            4
4     1        c            3
5     1        d            4
6     1        e            5
7     1        b            1
8     1        f            6
9     1        g            7
10    1        b            1
-----------------------------
11    2        x            8
12    2        g            7
13    2        g            7
14    2        f            6
15    2        h            9
16    2        i            10
17    2        f            11
18    2        k            12
19    2        l            13

Desired Output:

  User_id  Item  Order Number
0     1        b            1
1     1        a            2
2     1        c            3
3     1        d            4
4     1        c            3
5     1        d            4
6     1        e            5
7     1        b            1
8     1        f            6
9     1        g            7
10    1        b            1
-----------------------------
11    2        x            1
12    2        g            2
13    2        g            2
14    2        f            3
15    2        h            4
16    2        i            5
17    2        f            3
18    2        k            7
19    2        l            8

Advertisement

Answer

Use GroupBy.transform with factorize in lambda function:

df['Order Number'] = df.groupby('User_id')['Item'].transform(lambda x: pd.factorize(x)[0])+1
print (df)
    User_id Item  Order Number
0         1    b             1
1         1    a             2
2         1    c             3
3         1    d             4
4         1    c             3
5         1    d             4
6         1    e             5
7         1    b             1
8         1    f             6
9         1    g             7
10        1    b             1
11        2    x             1
12        2    g             2
13        2    g             2
14        2    f             3
15        2    h             4
16        2    i             5
17        2    f             3
18        2    k             6
19        2    l             7
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement