I have a pandas dataframe
From | To |
---|---|
A | B |
A | C |
D | E |
F | F |
B | G |
B | H |
B | I |
G | J |
G | K |
L | L |
M | M |
N | N |
I want to convert it into multi column hierarchy. The expected hierarchy will look like
Level_1 | Level_2 | Level_3 | Level_4 |
---|---|---|---|
A | B | G | J |
A | B | G | K |
A | B | H | |
A | B | I | |
A | C | ||
D | E | ||
F | F | ||
L | L | ||
M | M | ||
N | N |
Is there an in-built way in pandas to achieve this? I know i can use recursion, Is there any other simplified way?
Advertisement
Answer
You can easily get what you expect using networkx
# Python env: pip install networkx # Anaconda env: conda install networkx import networkx as nx import pandas as pd df = pd.DataFrame({'From': ['A', 'A', 'D', 'F', 'B', 'B', 'B', 'G', 'G', 'L', 'M', 'N'], 'To': ['B', 'C', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N']}) G = nx.from_pandas_edgelist(df, source='From', target='To', create_using=nx.DiGraph) roots = [v for v, d in G.in_degree() if d == 0] leaves = [v for v, d in G.out_degree() if d == 0] all_paths = [] for root in roots: for leaf in leaves: paths = nx.all_simple_paths(G, root, leaf) all_paths.extend(paths) for node in nx.nodes_with_selfloops(G): all_paths.append([node, node])
Output:
>>> pd.DataFrame(sorted(all_paths)).add_prefix('Level_').fillna('') Level_0 Level_1 Level_2 Level_3 0 A B G J 1 A B G K 2 A B H 3 A B I 4 A C 5 D E 6 F F 7 L L 8 M M 9 N N
Documentation: networkx.algorithms.simple_paths.all_simple_paths