I have a pandas dataframe which looks like this after the following code:
df['row_l0'] = df.groupby('Category',sort=False).ngroup()+1
df['row_l1'] = df.groupby(['Category','Process'],sort=False).ngroup()+1
df['row_l2'] = df.groupby(['Process','Parent'],sort=False).ngroup()+1
Category Process Parent row_l0 row_l1 row_l2
127 A a.5 a.5.4 1 5 24
128 A a.6 a.6.1 1 6 25
129 A a.6 a.6.2 1 6 26
130 A a.6 a.6.3 1 6 26
131 A a.6 a.6.4 1 6 27
132 A a.6 a.6.5 1 6 27
133 B b.1 b.1.1 2 7 28
134 B b.2 b.2.1 2 8 29
135 B b.2 b.2.2 2 8 29
For clarity, row_l0 relates to Category, row_l1 relates to Process and row_l2 to Parent.
The row_l0 is correct, but I can’t seem to be able to reset the count/grouping for the subsequent groups (row_l1 and row_l2) when I get to category B (and beyond).
E.g. at index 133, row_l0 is correctly assigned 2, but for row_l1 instead of continuing from 6 to 7, it should reset due to being a new Category and assigned 1
Ideally, I’d end up with something like:
Category Process Parent row_l0 row_l1 row_l2 127 A a.5 a.5.4 1 5 24 128 A a.6 a.6.1 1 6 25 129 A a.6 a.6.2 1 6 26 130 A a.6 a.6.3 1 6 26 131 A a.6 a.6.4 1 6 27 132 A a.6 a.6.5 1 6 27 133 B b.1 b.1.1 2 1 1 134 B b.2 b.2.1 2 2 2 135 B b.2 b.2.2 2 2 3
I’ve been trying a mixture of combinations using groupby, rank and ngroup, and adapted various other questions (or this one) but can’t seem to get it working.
Advertisement
Answer
With the following dataframe:
import pandas as pd
df = pd.DataFrame(
{
"Category": ["A", "A", "A", "A", "A", "A", "B", "B", "B"],
"Process": ["a.5", "a.6", "a.6", "a.6", "a.6", "a.6", "b.1", "b.2", "b.2"],
"Parent": [
"a.5.4",
"a.6.1",
"a.6.2",
"a.6.3",
"a.6.4",
"a.6.5",
"b.1.1",
"b.2.1",
"b.2.2",
],
},
)
Here is one way to do it:
df["row_l0"] = df["Category"].apply(
lambda x: {col: i + 1 for i, col in enumerate(df["Category"].unique())}[x]
)
df["row_l1"] = df["Process"].apply(lambda x: x[-1])
df["row_l2"] = [
j + 1
for count in df["Parent"].str[0].value_counts().to_dict().values()
for j in range(count)
]
print(df) # Output Category Process Parent row_l0 row_l1 row_l2 0 A a.5 a.5.4 1 5 1 1 A a.6 a.6.1 1 6 2 2 A a.6 a.6.2 1 6 3 3 A a.6 a.6.3 1 6 4 4 A a.6 a.6.4 1 6 5 5 A a.6 a.6.5 1 6 6 6 B b.1 b.1.1 2 1 1 7 B b.2 b.2.1 2 2 2 8 B b.2 b.2.2 2 2 3