Skip to content
Advertisement

How to reset the incrementing values when assigning values to groups in a pandas dataframe?

I have a pandas dataframe which looks like this after the following code:

df['row_l0'] = df.groupby('Category',sort=False).ngroup()+1
df['row_l1'] = df.groupby(['Category','Process'],sort=False).ngroup()+1
df['row_l2'] = df.groupby(['Process','Parent'],sort=False).ngroup()+1

    Category    Process Parent      row_l0  row_l1  row_l2
127         A       a.5     a.5.4       1       5       24
128         A       a.6     a.6.1       1       6       25
129         A       a.6     a.6.2       1       6       26
130         A       a.6     a.6.3       1       6       26
131         A       a.6     a.6.4       1       6       27
132         A       a.6     a.6.5       1       6       27
133         B       b.1     b.1.1       2       7       28
134         B       b.2     b.2.1       2       8       29
135         B       b.2     b.2.2       2       8       29

For clarity, row_l0 relates to Category, row_l1 relates to Process and row_l2 to Parent.

The row_l0 is correct, but I can’t seem to be able to reset the count/grouping for the subsequent groups (row_l1 and row_l2) when I get to category B (and beyond).

E.g. at index 133, row_l0 is correctly assigned 2, but for row_l1 instead of continuing from 6 to 7, it should reset due to being a new Category and assigned 1

Ideally, I’d end up with something like:

    Category    Process Parent      row_l0  row_l1  row_l2
127         A       a.5     a.5.4       1       5       24
128         A       a.6     a.6.1       1       6       25
129         A       a.6     a.6.2       1       6       26
130         A       a.6     a.6.3       1       6       26
131         A       a.6     a.6.4       1       6       27
132         A       a.6     a.6.5       1       6       27
133         B       b.1     b.1.1       2       1       1
134         B       b.2     b.2.1       2       2       2
135         B       b.2     b.2.2       2       2       3

I’ve been trying a mixture of combinations using groupby, rank and ngroup, and adapted various other questions (or this one) but can’t seem to get it working.

Advertisement

Answer

With the following dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "Category": ["A", "A", "A", "A", "A", "A", "B", "B", "B"],
        "Process": ["a.5", "a.6", "a.6", "a.6", "a.6", "a.6", "b.1", "b.2", "b.2"],
        "Parent": [
            "a.5.4",
            "a.6.1",
            "a.6.2",
            "a.6.3",
            "a.6.4",
            "a.6.5",
            "b.1.1",
            "b.2.1",
            "b.2.2",
        ],
    },
)

Here is one way to do it:

df["row_l0"] = df["Category"].apply(
    lambda x: {col: i + 1 for i, col in enumerate(df["Category"].unique())}[x]
)

df["row_l1"] = df["Process"].apply(lambda x: x[-1])

df["row_l2"] = [
    j + 1
    for count in df["Parent"].str[0].value_counts().to_dict().values()
    for j in range(count)
]
print(df)
# Output
  Category Process Parent  row_l0 row_l1  row_l2
0        A     a.5  a.5.4       1      5       1
1        A     a.6  a.6.1       1      6       2
2        A     a.6  a.6.2       1      6       3
3        A     a.6  a.6.3       1      6       4
4        A     a.6  a.6.4       1      6       5
5        A     a.6  a.6.5       1      6       6
6        B     b.1  b.1.1       2      1       1
7        B     b.2  b.2.1       2      2       2
8        B     b.2  b.2.2       2      2       3
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement