I have a table with 2 fields like so:
Account_ID | Parent_ID |
---|---|
x | y |
x1 | y |
x2 | y |
y | z |
y1 | z |
y2 | z |
z | z |
z | a |
z1 | a |
a | a |
b | b |
The IDs fields are both in int64 format. The first field represents accounts which could be controlled by a parent account which could be itself controlled by another parent account. For example: A controls Z who controls Y who controls X.
Not sure if this is relevant but for full context, it is possible for a parent to not actually use their own accounts for personal purpose and only use it to manage their children accounts. For example: Y falls under this scenario because it does not have a row relating it to itself like Z & A.
I am trying to create a field like in the table below that would easily highlight the structure of that hierarchy:
Account_ID | Parent_ID | Hierarchy |
---|---|---|
x | y | x/y/z/a |
x1 | y | x1/y/z/a |
x2 | y | x2/y/z/a |
y | z | x/y/z/a |
y1 | z | y1/z/a |
y2 | z | y2/z/a |
z | z | x/y/z/a |
z | a | x/y/z/a |
z1 | a | z1/a |
a | a | x/y/z/a |
b | b | b |
The one & two tiered ones are fairly simple with a concatenate, but I was wondering if there is an elegant way to populate the multi-tiered ones like “x/y/z/a”?
Advertisement
Answer
This does the trick. Advantageously it only relies on pandas and a small function.
def add_hierarchy(df, s, tier): df['Hierarchy'] = df['Account_ID'] for i in range(tier): next_tier = s.apply(lambda x: df[df['Account_ID'] == x].iloc[0]) df['Hierarchy'] += '/' + next_tier['Account_ID'] s = next_tier['Parent_ID'] add_hierarchy(df, df['Parent_ID'], tier=3)
Probably you may want to add some sort of nice format afterwards. However, it returns the correct hierarchies
Account_ID Parent_ID Hierarchy 0 x y x/y/z/a 1 x1 y x1/y/z/a 2 x2 y x2/y/z/a 3 y z y/z/a/a 4 y1 z y1/z/a/a 5 y2 z y2/z/a/a 6 z a z/a/a/a 7 z2 a z2/a/a/a 8 a a a/a/a/a 9 b b b/b/b/b
(Please check your output. z has two parents given. I assumed z to have only one parent to be deterministic as is also indicated in your example output.)