I have the following dataframe which contains Parent child relation:
data = pd.DataFrame({'Parent':['a','a','b','c','c','f','q','z','k'], Child':['b','c','d','f','g','h','k','q','w']}) a ├── b │ └── d └── c ├── f │ └── h └── g z └── q └── k └── w
I would like to get a new dataframe which contains e.g. all children of parent a
:
child | level1 | level2 | level x |
---|---|---|---|
d | a | b | – |
b | a | – | – |
c | a | – | – |
f | a | c | – |
h | a | c | f |
g | a | c | – |
I do not know how many levels there are upfront therefore I have used ‘level x’.
I guess I somehow need a recursive pattern iterate over the dataframe.
Advertisement
Answer
I’d suggest
- building each
children:parentList
- build the
DataFrame
with giving each parent alevel
name
import pandas as pd values = {'Parent': ['a', 'a', 'b', 'c', 'c', 'f', 'q', 'z', 'k'], 'Child': ['b', 'c', 'd', 'f', 'g', 'h', 'k', 'q', 'w']} relations = dict(zip(values['Child'], values['Parent'])) def get_parent_list(element): parent = relations.get(element) return get_parent_list(parent) + [parent] if parent else [] all_relations = { children: {f'level_{idx}': value for idx, value in enumerate(get_parent_list(children))} for children in set(values['Child']) } df = pd.DataFrame.from_dict(all_relations, orient='index') print(df) level_0 level_1 level_2 b a NaN NaN f a c NaN d a b NaN g a c NaN h a c f q z NaN NaN k z q NaN w z q k c a NaN NaN