I have a pandas dataframe that looks like this:
JavaScript
x
8
1
id | folder | level0_parent_id | level1_parent_id | level2_parent_id | level3_parent_id
2
3
1 A 0 0 0 0
4
2 B 1 0 0 0
5
3 C 1 2 0 0
6
4 D 1 2 0 0
7
5 E 1 2 4 0
8
I want to return the folder structure for each line in a separate colunm, for example
- line 1: A is the root there is no parent all ids are 0 = A
- line 2: B is under A, id = 1, so the path is A/B
- line 3: C is under id A=1 for level0 and under B=2 for level1, resulting = A/B/C
- line 4: D is under id A=1 for level0 and under B=2 for level1, resulting = A/B/D
- line 5: E is under id A=1 for level0 and under B=2 for level1 and under D=4 for level 2, resulting = A/B/D/E
and this go on and on, I can have multiple Roots meaning level0 = 0 and if level 0 is not 0 means that is not Root so it’s related to a Root …(obvious i think )
How can I do this with Pandas?
Advertisement
Answer
One approach with melt
JavaScript
1
9
1
df['structure'] = df['folder'].map(
2
df.melt(['id', 'folder'], var_name='level')
3
.assign(child=lambda x: x['value'].map(dict(zip(df['id'], df['folder'])))
4
.fillna(x['folder']))
5
.drop_duplicates(['folder', 'child'])
6
.groupby('folder')['child'].agg('/'.join)
7
)
8
print(df)
9
Output
JavaScript
1
14
14
1
id folder level0_parent_id level1_parent_id level2_parent_id
2
0 1 A 0 0 0
3
1 2 B 1 0 0
4
2 3 C 1 2 0
5
3 4 D 1 2 0
6
4 5 E 1 2 4
7
8
level3_parent_id structure
9
0 0 A
10
1 0 A/B
11
2 0 A/B/C
12
3 0 A/B/D
13
4 0 A/B/D/E
14