I have a pandas dataframe
| From | To |
|---|---|
| A | B |
| A | C |
| D | E |
| F | F |
| B | G |
| B | H |
| B | I |
| G | J |
| G | K |
| L | L |
| M | M |
| N | N |
I want to convert it into multi column hierarchy. The expected hierarchy will look like
| Level_1 | Level_2 | Level_3 | Level_4 |
|---|---|---|---|
| A | B | G | J |
| A | B | G | K |
| A | B | H | |
| A | B | I | |
| A | C | ||
| D | E | ||
| F | F | ||
| L | L | ||
| M | M | ||
| N | N |
Is there an in-built way in pandas to achieve this? I know i can use recursion, Is there any other simplified way?
Advertisement
Answer
You can easily get what you expect using networkx
# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx
import pandas as pd
df = pd.DataFrame({'From': ['A', 'A', 'D', 'F', 'B', 'B', 'B', 'G', 'G', 'L', 'M', 'N'],
'To': ['B', 'C', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N']})
G = nx.from_pandas_edgelist(df, source='From', target='To', create_using=nx.DiGraph)
roots = [v for v, d in G.in_degree() if d == 0]
leaves = [v for v, d in G.out_degree() if d == 0]
all_paths = []
for root in roots:
for leaf in leaves:
paths = nx.all_simple_paths(G, root, leaf)
all_paths.extend(paths)
for node in nx.nodes_with_selfloops(G):
all_paths.append([node, node])
Output:
>>> pd.DataFrame(sorted(all_paths)).add_prefix('Level_').fillna('')
Level_0 Level_1 Level_2 Level_3
0 A B G J
1 A B G K
2 A B H
3 A B I
4 A C
5 D E
6 F F
7 L L
8 M M
9 N N
Documentation: networkx.algorithms.simple_paths.all_simple_paths