I have a pandas dataframe that looks like this:
id | folder | level0_parent_id | level1_parent_id | level2_parent_id | level3_parent_id 1 A 0 0 0 0 2 B 1 0 0 0 3 C 1 2 0 0 4 D 1 2 0 0 5 E 1 2 4 0
I want to return the folder structure for each line in a separate colunm, for example
- line 1: A is the root there is no parent all ids are 0 = A
- line 2: B is under A, id = 1, so the path is A/B
- line 3: C is under id A=1 for level0 and under B=2 for level1, resulting = A/B/C
- line 4: D is under id A=1 for level0 and under B=2 for level1, resulting = A/B/D
- line 5: E is under id A=1 for level0 and under B=2 for level1 and under D=4 for level 2, resulting = A/B/D/E
and this go on and on, I can have multiple Roots meaning level0 = 0 and if level 0 is not 0 means that is not Root so it’s related to a Root …(obvious i think )
How can I do this with Pandas?
Advertisement
Answer
One approach with melt
df['structure'] = df['folder'].map( df.melt(['id', 'folder'], var_name='level') .assign(child=lambda x: x['value'].map(dict(zip(df['id'], df['folder']))) .fillna(x['folder'])) .drop_duplicates(['folder', 'child']) .groupby('folder')['child'].agg('/'.join) ) print(df)
Output
id folder level0_parent_id level1_parent_id level2_parent_id 0 1 A 0 0 0 1 2 B 1 0 0 2 3 C 1 2 0 3 4 D 1 2 0 4 5 E 1 2 4 level3_parent_id structure 0 0 A 1 0 A/B 2 0 A/B/C 3 0 A/B/D 4 0 A/B/D/E