I have a table with 2 fields like so:
Account_ID | Parent_ID |
---|---|
x | y |
x1 | y |
x2 | y |
y | z |
y1 | z |
y2 | z |
z | z |
z | a |
z1 | a |
a | a |
b | b |
The IDs fields are both in int64 format. The first field represents accounts which could be controlled by a parent account which could be itself controlled by another parent account. For example: A controls Z who controls Y who controls X.
Not sure if this is relevant but for full context, it is possible for a parent to not actually use their own accounts for personal purpose and only use it to manage their children accounts. For example: Y falls under this scenario because it does not have a row relating it to itself like Z & A.
I am trying to create a field like in the table below that would easily highlight the structure of that hierarchy:
Account_ID | Parent_ID | Hierarchy |
---|---|---|
x | y | x/y/z/a |
x1 | y | x1/y/z/a |
x2 | y | x2/y/z/a |
y | z | x/y/z/a |
y1 | z | y1/z/a |
y2 | z | y2/z/a |
z | z | x/y/z/a |
z | a | x/y/z/a |
z1 | a | z1/a |
a | a | x/y/z/a |
b | b | b |
The one & two tiered ones are fairly simple with a concatenate, but I was wondering if there is an elegant way to populate the multi-tiered ones like “x/y/z/a”?
Advertisement
Answer
This does the trick. Advantageously it only relies on pandas and a small function.
def add_hierarchy(df, s, tier):
df['Hierarchy'] = df['Account_ID']
for i in range(tier):
next_tier = s.apply(lambda x: df[df['Account_ID'] == x].iloc[0])
df['Hierarchy'] += '/' + next_tier['Account_ID']
s = next_tier['Parent_ID']
add_hierarchy(df, df['Parent_ID'], tier=3)
Probably you may want to add some sort of nice format afterwards. However, it returns the correct hierarchies
Account_ID Parent_ID Hierarchy
0 x y x/y/z/a
1 x1 y x1/y/z/a
2 x2 y x2/y/z/a
3 y z y/z/a/a
4 y1 z y1/z/a/a
5 y2 z y2/z/a/a
6 z a z/a/a/a
7 z2 a z2/a/a/a
8 a a a/a/a/a
9 b b b/b/b/b
(Please check your output. z has two parents given. I assumed z to have only one parent to be deterministic as is also indicated in your example output.)