I have a pandas dataframe which looks like this after the following code:
df['row_l0'] = df.groupby('Category',sort=False).ngroup()+1 df['row_l1'] = df.groupby(['Category','Process'],sort=False).ngroup()+1 df['row_l2'] = df.groupby(['Process','Parent'],sort=False).ngroup()+1 Category Process Parent row_l0 row_l1 row_l2 127 A a.5 a.5.4 1 5 24 128 A a.6 a.6.1 1 6 25 129 A a.6 a.6.2 1 6 26 130 A a.6 a.6.3 1 6 26 131 A a.6 a.6.4 1 6 27 132 A a.6 a.6.5 1 6 27 133 B b.1 b.1.1 2 7 28 134 B b.2 b.2.1 2 8 29 135 B b.2 b.2.2 2 8 29
For clarity, row_l0
relates to Category
, row_l1
relates to Process
and row_l2
to Parent
.
The row_l0
is correct, but I can’t seem to be able to reset the count/grouping for the subsequent groups (row_l1
and row_l2
) when I get to category B (and beyond).
E.g. at index 133
, row_l0
is correctly assigned 2
, but for row_l1
instead of continuing from 6
to 7
, it should reset due to being a new Category
and assigned 1
Ideally, I’d end up with something like:
Category Process Parent row_l0 row_l1 row_l2 127 A a.5 a.5.4 1 5 24 128 A a.6 a.6.1 1 6 25 129 A a.6 a.6.2 1 6 26 130 A a.6 a.6.3 1 6 26 131 A a.6 a.6.4 1 6 27 132 A a.6 a.6.5 1 6 27 133 B b.1 b.1.1 2 1 1 134 B b.2 b.2.1 2 2 2 135 B b.2 b.2.2 2 2 3
I’ve been trying a mixture of combinations using groupby, rank and ngroup, and adapted various other questions (or this one) but can’t seem to get it working.
Advertisement
Answer
With the following dataframe:
import pandas as pd df = pd.DataFrame( { "Category": ["A", "A", "A", "A", "A", "A", "B", "B", "B"], "Process": ["a.5", "a.6", "a.6", "a.6", "a.6", "a.6", "b.1", "b.2", "b.2"], "Parent": [ "a.5.4", "a.6.1", "a.6.2", "a.6.3", "a.6.4", "a.6.5", "b.1.1", "b.2.1", "b.2.2", ], }, )
Here is one way to do it:
df["row_l0"] = df["Category"].apply( lambda x: {col: i + 1 for i, col in enumerate(df["Category"].unique())}[x] ) df["row_l1"] = df["Process"].apply(lambda x: x[-1]) df["row_l2"] = [ j + 1 for count in df["Parent"].str[0].value_counts().to_dict().values() for j in range(count) ]
print(df) # Output Category Process Parent row_l0 row_l1 row_l2 0 A a.5 a.5.4 1 5 1 1 A a.6 a.6.1 1 6 2 2 A a.6 a.6.2 1 6 3 3 A a.6 a.6.3 1 6 4 4 A a.6 a.6.4 1 6 5 5 A a.6 a.6.5 1 6 6 6 B b.1 b.1.1 2 1 1 7 B b.2 b.2.1 2 2 2 8 B b.2 b.2.2 2 2 3